Integrating a new serialization format into your API pipeline might sound daunting, but TOON is designed to live alongside JSON, not necessarily replace it entirely. This guide covers the architectural patterns for injecting TOON serialization right before the data hits your LLM provider.
The Middleware Pattern
The most common implementation is the Middleware Pattern. Your internal services communicate in JSON as usual. However, the service responsible for calling OpenAI, Anthropic, or other LLM APIs acts as a boundary.
In a Node.js/Express environment, you can create a transform utility:
// utilities/toonConverter.js
function jsonToToon(jsonData) {
if (Array.isArray(jsonData) && jsonData.length > 0) {
const keys = Object.keys(jsonData[0]);
// Create Header
let toon = `@cols(${keys.join(',')})`;
// Map rows
jsonData.forEach(item => {
toon += '|' + keys.map(k => item[k]).join(',');
});
return toon;
}
return JSON.stringify(jsonData); // Fallback
}
Handling Request/Response Cycles
When building a Chatbot API, you often retrieve history from a database (Postgres/Mongo). Instead of passing the raw JSON array of messages to the LLM context, pass it through the converter.
Before TOON:
const messages = await db.getHistory(userId);
const response = await openai.chat.completions.create({
model: "gpt-4",
messages: [
{ role: "system", content: "Analyze this data: " + JSON.stringify(data) }
]
});
After TOON:
const messages = await db.getHistory(userId);
const toonData = jsonToToon(data); // Compression happens here
const response = await openai.chat.completions.create({
model: "gpt-4",
messages: [
{ role: "system", content: "Analyze this TOON formatted data: " + toonData }
]
});
Client-Side Integration
For Single Page Applications (SPAs), you can perform the conversion in the browser before sending the prompt to your backend proxy. This reduces the payload size over the network, improving latency for users on mobile connections.
Conclusion
You don't need to rewrite your database schema. TOON is a "Last Mile" optimization protocol. Use it where the cost is highest: at the LLM API boundary.