← All articles
Foundations9 min read

How Language Models Actually Work

What's actually happening inside ChatGPT and Claude — explained without the jargon.

Tokens, not words

Language models don't read words — they read tokens. A token is roughly 3/4 of a word. The word "hamburger" might be split into "ham," "bur," and "ger." Common short words like "the" or "is" are single tokens.

Why does this matter? Because every model has a context window — a limit on how many tokens it can process at once. GPT-4o has a 128k token context window. Claude has 200k. Think of it as working memory. If you give it more text than fits in the window, it starts to lose or truncate the earlier parts.

Training: what actually happened

An LLM is trained on a dataset of text — web pages, books, code, articles, forums. For GPT-4 and Claude, this is roughly a trillion words.

During training, the model sees a sentence with the last word hidden and tries to predict it. It gets feedback on whether it was right, and adjusts its parameters slightly. Do this trillions of times, and the model develops a very sophisticated understanding of language, facts, and reasoning — not because anyone programmed those things in, but because they emerged from pattern recognition at scale.

Generation: one token at a time

When you send a message, the model doesn't think for a moment and then output an answer. It generates one token at a time, each one based on everything before it.

This is why you see the text stream in — it's literally generating token by token. It picks the most likely next token, then uses that to inform the next pick, and so on until it decides to stop.

Temperature: the randomness dial

Temperature is a setting that controls how random the model's token selection is. At low temperature, it almost always picks the most likely next token — predictable, consistent, sometimes repetitive. At high temperature, it occasionally picks less likely tokens — more creative, more varied, sometimes less coherent.

Most interfaces set this for you. But knowing it exists helps you understand why the same prompt can give different results each time.

Why it hallucinates

Hallucination is when an AI confidently states something false. It's one of the most important limitations to understand.

LLMs generate plausible text. They don't have a fact-checking mechanism. They have no way to distinguish between "I know this" and "this sounds right based on patterns." So when asked about something they don't have solid training data on, they generate what seems most plausible — which can be completely wrong.

The practical rule: verify anything factual before you rely on it. Use Perplexity or primary sources for facts. Use LLMs for reasoning, writing, and generation.

RLHF: why it doesn't just say whatever

After initial training, models go through Reinforcement Learning from Human Feedback (RLHF). Human raters evaluate model responses and the model is trained to generate responses humans prefer — helpful, accurate, safe.

This is why Claude and ChatGPT feel like they're trying to be useful rather than just predicting text. The RLHF process shaped their "personality" and behavior on top of the base language model.

Takeaway

LLMs generate text token by token based on patterns from training data. They don't retrieve facts — they predict them. Understanding this makes you a better user of AI.