Overview
Simon Willison built a Unicode character lookup tool that demonstrates how HTTP range requests enable efficient binary search over large remote files without downloading the entire dataset. The tool searches through 76.6MB of Unicode metadata by fetching only small byte ranges needed for each binary search step.
The Breakdown
- Binary search over HTTP range requests - fetches only specific byte ranges from a remote 76.6MB Unicode file instead of downloading the entire dataset, completing searches in ~17 steps with under 4KB transferred
- HTTP compression compatibility issues - range requests don’t work with compressed files because compression changes byte offsets, but CDNs like Cloudflare automatically disable compression when range headers are present
- AI-assisted development workflow - used Claude to brainstorm use cases for binary search, generate specifications, and implement the working code through an asynchronous research process
- Unicode codepoint lookup mechanism - searches sorted Unicode metadata by character input (like ‘ΓΈ’) or hex codepoint (like ‘1F99C’) to return character information including category and Unicode block