Streaming is the baseline for conversational UX. token-hub speaks the OpenAI-compatible SSE shape on /v1/chat/completions, so the server route you would write for an OpenAI-style backend can point at TokenHub by changing the base URL and bearer key.
This scenario keeps the TokenHub key on the server and uses the current public smoke-tested moonshot-v1-8k model.
Architecture
browser --fetch--> /api/chat
|
v
token-hub /v1/chat/completions
|
v
enabled upstream model channel
The route handler exists for two reasons: it hides your TokenHub key from the browser, and it gives you a place for product-side auth, rate limits, and logging before forwarding.
The server route
// app/api/chat/route.ts
export const runtime = "edge";
const TOKENHUB_URL = "https://llm.sandboxclaw.com/v1/chat/completions";
export async function POST(req: Request) {
const { messages } = await req.json();
const upstream = await fetch(TOKENHUB_URL, {
method: "POST",
headers: {
"Authorization": `Bearer ${process.env.TOKENHUB_KEY!}`,
"Content-Type": "application/json",
},
body: JSON.stringify({
model: "moonshot-v1-8k",
messages,
stream: true,
}),
});
if (!upstream.ok || !upstream.body) {
return new Response(`Upstream error: ${upstream.status}`, { status: 502 });
}
return new Response(upstream.body, {
headers: {
"Content-Type": "text/event-stream",
"Cache-Control": "no-cache, no-transform",
"Connection": "keep-alive",
},
});
}
The client
Minimal React that reads SSE frames and appends deltas to the last assistant message:
// app/page.tsx
"use client";
import { useState } from "react";
type Msg = { role: "user" | "assistant"; content: string };
export default function Chat() {
const [msgs, setMsgs] = useState<Msg[]>([]);
const [input, setInput] = useState("");
async function send() {
const next: Msg[] = [...msgs, { role: "user", content: input }];
setMsgs([...next, { role: "assistant", content: "" }]);
setInput("");
const res = await fetch("/api/chat", {
method: "POST",
body: JSON.stringify({ messages: next }),
});
if (!res.body) return;
const reader = res.body.getReader();
const decoder = new TextDecoder();
let buffer = "";
while (true) {
const { done, value } = await reader.read();
if (done) break;
buffer += decoder.decode(value, { stream: true });
const frames = buffer.split("\n\n");
buffer = frames.pop() ?? "";
for (const frame of frames) {
const line = frame.replace(/^data:\s*/, "");
if (line === "[DONE]") return;
try {
const chunk = JSON.parse(line);
const delta = chunk.choices?.[0]?.delta?.content ?? "";
if (delta) {
setMsgs((prev) => {
const copy = [...prev];
copy[copy.length - 1].content += delta;
return copy;
});
}
} catch {
// Ignore keep-alive frames.
}
}
}
}
return (
<div className="mx-auto max-w-2xl p-6">
<div className="space-y-3">
{msgs.map((m, i) => (
<div key={i} className={m.role === "user" ? "text-right" : ""}>
<span className="inline-block rounded-lg bg-slate-100 px-3 py-2">
{m.content}
</span>
</div>
))}
</div>
<div className="mt-4 flex gap-2">
<input
value={input}
onChange={(e) => setInput(e.target.value)}
className="flex-1 rounded border p-2"
placeholder="Ask something..."
/>
<button onClick={send} className="rounded bg-blue-600 px-4 text-white">
Send
</button>
</div>
</div>
);
}
Gotchas
- Keep the key server-side. Never ship
TOKENHUB_KEYto the browser. - Handle
[DONE]explicitly. Some streams send a trailingdata: [DONE]frame; others close the connection. - Abort abandoned streams. Tie the fetch to an
AbortControllerwhen users navigate away. - Treat model lists as live data. Show only models returned by the authenticated model list or documented as verified.