Overview

Coding agents are software that act as a harness for LLMs, extending their capabilities with additional tools and invisible prompts. Understanding their architecture helps developers apply these tools more effectively by revealing how they combine language models with external capabilities.

The Breakdown

  • LLMs work by token completion - converting text to integer sequences and predicting the next tokens, which is why providers charge by token count and have processing limits
  • Chat interfaces use templated prompts that simulate conversation by replaying the entire chat history with each new request, making longer conversations progressively more expensive
  • Modern LLMs are multimodal - they can process images, sketches and screenshots by converting them directly into tokens alongside text, not through separate OCR systems
  • LLMs are stateless - they start fresh with each prompt, so maintaining conversation requires external software to manage and replay context
  • Coding agents extend basic LLMs with callable tools and invisible prompts that provide capabilities beyond simple text completion