Streaming is the baseline for conversational UX. A chat UI that renders in one chunk after 4 seconds feels broken; the same content rendered token-by-token feels alive. token-hub speaks OpenAI-compatible SSE, so the exact pattern you would write for OpenAI works unchanged.
This scenario shows a minimal Next.js chat app with a Route Handler that proxies a streaming request to token-hub.
Architecture
browser ──fetch──► /api/chat (Edge Route Handler)
│
▼
token-hub /v1/chat/completions (stream=true)
│
▼
upstream (Claude / GPT / Gemini / ...)
The Edge handler exists for two reasons: it hides your sk-th_... key from the browser, and it lets you add server-side logic (rate limiting, auth, logging) before forwarding.
The server route
// app/api/chat/route.ts
export const runtime = "edge";
const TOKENHUB_URL = "https://api.sandboxclaw.com/v1/chat/completions";
export async function POST(req: Request) {
const { messages, model = "claude-3-5-sonnet-20241022" } = await req.json();
const upstream = await fetch(TOKENHUB_URL, {
method: "POST",
headers: {
"Authorization": `Bearer ${process.env.TOKENHUB_KEY!}`,
"Content-Type": "application/json",
},
body: JSON.stringify({
model,
messages,
stream: true,
}),
});
if (!upstream.ok || !upstream.body) {
return new Response(`Upstream error: ${upstream.status}`, { status: 502 });
}
// Pass the SSE stream through unchanged.
return new Response(upstream.body, {
headers: {
"Content-Type": "text/event-stream",
"Cache-Control": "no-cache, no-transform",
"Connection": "keep-alive",
},
});
}
No special SDK, no adapter — token-hub’s stream format matches what browsers expect from any OpenAI-compatible backend.
The client
Minimal React that reads the stream and appends deltas to a message buffer:
// app/page.tsx
"use client";
import { useState } from "react";
type Msg = { role: "user" | "assistant"; content: string };
export default function Chat() {
const [msgs, setMsgs] = useState<Msg[]>([]);
const [input, setInput] = useState("");
async function send() {
const next: Msg[] = [...msgs, { role: "user", content: input }];
setMsgs([...next, { role: "assistant", content: "" }]);
setInput("");
const res = await fetch("/api/chat", {
method: "POST",
body: JSON.stringify({ messages: next }),
});
if (!res.body) return;
const reader = res.body.getReader();
const decoder = new TextDecoder();
let buffer = "";
while (true) {
const { done, value } = await reader.read();
if (done) break;
buffer += decoder.decode(value, { stream: true });
// SSE frames end with \n\n; split and process.
const frames = buffer.split("\n\n");
buffer = frames.pop() ?? "";
for (const frame of frames) {
const line = frame.replace(/^data:\s*/, "");
if (line === "[DONE]") return;
try {
const chunk = JSON.parse(line);
const delta = chunk.choices?.[0]?.delta?.content ?? "";
if (delta) {
setMsgs((prev) => {
const copy = [...prev];
copy[copy.length - 1].content += delta;
return copy;
});
}
} catch {
// ignore keep-alive frames
}
}
}
}
return (
<div className="mx-auto max-w-2xl p-6">
<div className="space-y-3">
{msgs.map((m, i) => (
<div key={i} className={m.role === "user" ? "text-right" : ""}>
<span className="inline-block rounded-lg bg-slate-100 px-3 py-2">{m.content}</span>
</div>
))}
</div>
<div className="mt-4 flex gap-2">
<input
value={input}
onChange={(e) => setInput(e.target.value)}
className="flex-1 rounded border p-2"
placeholder="Ask something…"
/>
<button onClick={send} className="rounded bg-blue-600 px-4 text-white">Send</button>
</div>
</div>
);
}
Swapping models
Because the model string is just a parameter, you can let users pick from a dropdown:
const models = [
"claude-3-5-sonnet-20241022",
"gpt-4o",
"gemini-2.0-flash",
"deepseek-chat",
];
Pass the chosen value in the POST body. The rest of the stack does not change.
Gotchas
- Keep the key server-side. Never ship
sk-th_...to the browser. The Edge handler exists for this reason. - Handle
[DONE]explicitly. Some upstream streams send a trailingdata: [DONE]frame; others close the connection. The client above handles both by breaking ondonefrom the reader. - Back-pressure. If the user closes the tab mid-stream, call
reader.cancel()or tie the fetch to anAbortController. Otherwise you are billed for tokens nobody sees. - Token usage is on the final non-streamed response only. Streaming SSE frames do not include the
usageblock — we report usage separately via the dashboard. If you need per-request cost attribution at the client, issue a non-streaming call or fetch balance deltas after the conversation.
Where to take it
- Add auth in front of the route — use NextAuth or your own session check before forwarding.
- Persist conversations to a database after each assistant turn completes.
- For multi-turn tool use, switch to a non-streaming path once
tool_callsappears, resolve locally, and stream the final message back.