Tokens and usage

How we count usage and what uses more tokens.

What are tokens?

Tokens are the units the AI model uses to read and write text. Roughly, one token is about four characters or three-quarters of a word in English. Your messages, the AI's replies, and the context we send (instructions, summaries) all consume tokens. The usage bar in Settings (Dashboard) shows how many tokens you've used since your last allowance reset and your plan's cap.

Why does each message use so much?

AI models don't have human memory. Every time you press Send, we have to bundle your new message, the whole conversation so far, and a large invisible instruction set (how the AI should act, knowledge, filters, and more) and send it all to the server at once. For every reply, the model is effectively reading one big text: the instructions plus all previous messages, again and again. It's like a snowball rolling down a hill: the longer the chat, the more we send each turn.

What counts toward usage?

Every request adds together: your message, the AI's reply, and the context we send with each request. That context includes your personalization instructions, any imported style summary (Pro), and for long chats a compressed summary of older messages. So longer conversations and longer messages use more tokens than short ones.

What increases token use?

Long conversations – More messages and a longer history (or its summary) mean more tokens per request.
Web search (all plans) – When you turn on Web for a message, we add search results and page content to the context, which increases tokens for that turn.
Cross-chat memory (all plans) – When enabled, we add short summaries of your other chats to the context, so each request can use more tokens.
Imported context (Pro & Unlimited) – If you've imported ChatGPT data, the style and topic summary is sent with each message until you turn it off in the Import tab.
Image generation (Pro & Unlimited) – Each image request uses tokens for the prompt and the model's response.
Regenerating – Each regeneration is a new request and counts again.

Context window and long chats

The model has a large context window (128k tokens). For each message we send the most recent part of the conversation (up to 50 messages) so the AI can follow the thread. If a chat gets very long, we keep the latest exchange and a summary of earlier context where applicable. You can continue in the same chat; there is no hard stop. Your monthly token allowance is separate and resets each billing period.

How can I use fewer tokens?

Start a new chat when a thread gets very long; turn off Web search when you don't need live results; turn off Cross-chat memory in Settings if you don't need it; and on Pro you can temporarily turn off Use imported context in chat in the Import tab. Your usage resets at the start of each billing period (Plus, Pro & Unlimited). On Unlimited, there is also a fair-use limit of 160 messages per hour (resets every hour). This limit is for bot protection; it is practically impossible to reach for a human having a normal conversation. Free plan is limited to the free token allowance.

More questions? See our FAQ for pricing, plans, refunds, and feature details.