Every request you send to GPT-4, Claude, or Gemini is measured and billed in tokens — not characters, not words. Understanding how tokenization works is essential for controlling costs, fitting prompts within context windows, and building efficient LLM applications.

What Is a Token?

A token is a chunk of text as seen by the model's tokenizer. Tokenizers use Byte-Pair Encoding (BPE) to split text into the most statistically common subword units found during training.

Rules of thumb:

~4 characters = 1 token (for English text)

1 word ≈ 1.3 tokens (common words are single tokens, rare words split)

Whitespace and punctuation consume tokens

Code is denser: a line of code may be 10–30 tokens

"Hello, world!" → ["Hello", ",", " world", "!"] = 4 tokens

"authentication" → ["authentic", "ation"] = 2 tokens

"supercalifragilistic" → 6+ tokens

Tokenizers by Model Family

| Model Family | Tokenizer | Vocab Size |

|---|---|---|

| GPT-4 / GPT-3.5 | cl100k_base | 100,277 |

| GPT-4o | o200k_base | 200,000 |

| Claude (all) | Claude's own BPE | ~100k |

| Gemini | SentencePiece | ~32k-256k |

| LLaMA / Mistral | SentencePiece | 32,000 |

Different tokenizers count the same text differently. A 1,000-token GPT-4 prompt may be 1,050 tokens with Claude.

Context Windows in 2025

| Model | Context Window | Notes |

|---|---|---|

| GPT-4o | 128,000 tokens | Standard version |

| GPT-4 Turbo | 128,000 tokens | |

| Claude 3.5 Sonnet | 200,000 tokens | |

| Gemini 1.5 Pro | 1,000,000 tokens | |

| Gemini 1.5 Flash | 1,000,000 tokens | |

| LLaMA 3.1 | 128,000 tokens | |

How API Costs Work

You pay for input tokens (your prompt + system prompt + history) and output tokens (the model's response). Output is typically 2–4× more expensive per token.

Total cost = (input_tokens × input_price) + (output_tokens × output_price)

Approximate pricing (per 1M tokens):

| Model | Input | Output |

|---|---|---|

| GPT-4o | $5 | $15 |

| GPT-4o mini | $0.15 | $0.60 |

| Claude 3.5 Sonnet | $3 | $15 |

| Gemini 1.5 Pro | $3.50 | $10.50 |

| Gemini 1.5 Flash | $0.075 | $0.30 |

Counting Tokens Before Sending

Use the OpenAI tiktoken library:

import tiktoken

enc = tiktoken.encoding_for_model("gpt-4o")

text = "Hello, how are you today?"

tokens = enc.encode(text)

print(f"Token count: {len(tokens)}") # → 6

For real-time token counting without code, use our Token Counter tool — paste any text and instantly see token counts for GPT-4, Claude, and Gemini, plus estimated API costs.

Strategies to Reduce Token Usage

1. Use a Smaller Model for Simple Tasks

GPT-4o mini costs 33× less than GPT-4o. For classification, extraction, or summarization-like tasks, smaller models often match quality at a fraction of the cost.

2. Compress System Prompts

Every message in every API call includes the full system prompt. A 500-token system prompt on 10,000 daily requests adds 5M tokens/day.

Before:

You are a helpful assistant that answers questions in a friendly,

professional, and concise manner. Always be respectful. Never

use profanity. Format your responses clearly.

After (same instructions, ~40% fewer tokens):

Answer concisely, professionally. No profanity. Format clearly.

3. Manage Conversation History

In chat applications, the entire conversation history is sent with each message. Cap history to the last N messages or summarize older turns:

MAX_HISTORY = 10

messages = [system_message] + conversation_history[-MAX_HISTORY:]

4. Use RAG Instead of Large Context

Rather than stuffing an entire document into context, use Retrieval-Augmented Generation (RAG) to fetch only the relevant chunks (typically 512–1024 tokens each).

5. Request Structured Output

JSON mode or function calling often produces shorter, more predictable outputs than prose explanations.

6. Batch Where Possible

For non-interactive tasks, use batch APIs (OpenAI Batch API, Anthropic batch) at 50% lower cost.

Token Budget Planning

For production LLM applications, calculate your monthly token budget:

Monthly tokens = daily_requests × (avg_input_tokens + avg_output_tokens) × 30

Monthly cost = monthly_tokens / 1,000,000 × price_per_million

Use our Token Counter to measure your typical prompt sizes and estimate costs before scaling.