The Sixth Layer: When a Local Model Beats the LLM That Trained It

In my last post I walked through five layers of token optimization that took a product classifier from $200+/month down to $25–40: context compression, two-stage prompting, exact-match lookup, similarity caching, and batching. Each layer attacked either the size of the context or the number of LLM calls. The post ended with the bill mostly tamed. But there was a layer I hadn’t built yet, and it’s the one that interests me most in hindsight, because it inverts the whole relationship: instead of making the LLM cheaper, you train a model to not need the LLM at all for most of the work — using the LLM’s own past output as training data. ...

May 7, 2026 · 11 min · 2220 words · Darek Dwornikowski

From $200 to $30: Five Layers of LLM Cost Optimization

The Problem One of the services I’ve been building for an ecommerce app is a product categorizer: given a product name, assign it a 3-level category path from a large taxonomy. The app is in Polish, so both the product names and the category tree are in Polish — which matters less than you’d think, since LLMs handle this well, but it does mean the examples in this post would normally look like “Drzwi sosnowe 80cm” instead of “Wooden door 80cm.” I’ve translated everything to English here for readability. Simple on paper, until you’re classifying ~1M products a month and watching your LLM bill climb past $200. ...

April 24, 2026 · 9 min · 1854 words · Darek Dwornikowski