How coding agents work

Overview

Coding agents are software that act as a harness for LLMs, extending their capabilities with additional tools and invisible prompts. Understanding their architecture helps developers apply these tools more effectively by revealing how they combine language models with external capabilities.

View Original

The Breakdown

LLMs work by token completion - converting text to integer sequences and predicting the next tokens, which is why providers charge by token count and have processing limits
Chat interfaces use templated prompts that simulate conversation by replaying the entire chat history with each new request, making longer conversations progressively more expensive
Modern LLMs are multimodal - they can process images, sketches and screenshots by converting them directly into tokens alongside text, not through separate OCR systems
LLMs are stateless - they start fresh with each prompt, so maintaining conversation requires external software to manage and replay context
Coding agents extend basic LLMs with callable tools and invisible prompts that provide capabilities beyond simple text completion