Chat app with streaming responses — token-hub scenarios

Streaming is the baseline for conversational UX. A chat UI that renders in one chunk after 4 seconds feels broken; the same content rendered token-by-token feels alive. token-hub speaks OpenAI-compatible SSE, so the exact pattern you would write for OpenAI works unchanged.

This scenario shows a minimal Next.js chat app with a Route Handler that proxies a streaming request to token-hub.

Architecture

browser ──fetch──► /api/chat (Edge Route Handler)
                         │
                         ▼
                  token-hub /v1/chat/completions (stream=true)
                         │
                         ▼
                  upstream (Claude / GPT / Gemini / ...)

The Edge handler exists for two reasons: it hides your sk-th_... key from the browser, and it lets you add server-side logic (rate limiting, auth, logging) before forwarding.

The server route

// app/api/chat/route.ts
export const runtime = "edge";

const TOKENHUB_URL = "https://api.sandboxclaw.com/v1/chat/completions";

export async function POST(req: Request) {
  const { messages, model = "claude-3-5-sonnet-20241022" } = await req.json();

  const upstream = await fetch(TOKENHUB_URL, {
    method: "POST",
    headers: {
      "Authorization": `Bearer ${process.env.TOKENHUB_KEY!}`,
      "Content-Type": "application/json",
    },
    body: JSON.stringify({
      model,
      messages,
      stream: true,
    }),
  });

  if (!upstream.ok || !upstream.body) {
    return new Response(`Upstream error: ${upstream.status}`, { status: 502 });
  }

  // Pass the SSE stream through unchanged.
  return new Response(upstream.body, {
    headers: {
      "Content-Type": "text/event-stream",
      "Cache-Control": "no-cache, no-transform",
      "Connection": "keep-alive",
    },
  });
}

No special SDK, no adapter — token-hub’s stream format matches what browsers expect from any OpenAI-compatible backend.

The client

Minimal React that reads the stream and appends deltas to a message buffer:

// app/page.tsx
"use client";
import { useState } from "react";

type Msg = { role: "user" | "assistant"; content: string };

export default function Chat() {
  const [msgs, setMsgs] = useState<Msg[]>([]);
  const [input, setInput] = useState("");

  async function send() {
    const next: Msg[] = [...msgs, { role: "user", content: input }];
    setMsgs([...next, { role: "assistant", content: "" }]);
    setInput("");

    const res = await fetch("/api/chat", {
      method: "POST",
      body: JSON.stringify({ messages: next }),
    });
    if (!res.body) return;

    const reader = res.body.getReader();
    const decoder = new TextDecoder();
    let buffer = "";

    while (true) {
      const { done, value } = await reader.read();
      if (done) break;
      buffer += decoder.decode(value, { stream: true });

      // SSE frames end with \n\n; split and process.
      const frames = buffer.split("\n\n");
      buffer = frames.pop() ?? "";

      for (const frame of frames) {
        const line = frame.replace(/^data:\s*/, "");
        if (line === "[DONE]") return;
        try {
          const chunk = JSON.parse(line);
          const delta = chunk.choices?.[0]?.delta?.content ?? "";
          if (delta) {
            setMsgs((prev) => {
              const copy = [...prev];
              copy[copy.length - 1].content += delta;
              return copy;
            });
          }
        } catch {
          // ignore keep-alive frames
        }
      }
    }
  }

  return (
    <div className="mx-auto max-w-2xl p-6">
      <div className="space-y-3">
        {msgs.map((m, i) => (
          <div key={i} className={m.role === "user" ? "text-right" : ""}>
            <span className="inline-block rounded-lg bg-slate-100 px-3 py-2">{m.content}</span>
          </div>
        ))}
      </div>
      <div className="mt-4 flex gap-2">
        <input
          value={input}
          onChange={(e) => setInput(e.target.value)}
          className="flex-1 rounded border p-2"
          placeholder="Ask something…"
        />
        <button onClick={send} className="rounded bg-blue-600 px-4 text-white">Send</button>
      </div>
    </div>
  );
}

Swapping models

Because the model string is just a parameter, you can let users pick from a dropdown:

const models = [
  "claude-3-5-sonnet-20241022",
  "gpt-4o",
  "gemini-2.0-flash",
  "deepseek-chat",
];

Pass the chosen value in the POST body. The rest of the stack does not change.

Gotchas

Keep the key server-side. Never ship sk-th_... to the browser. The Edge handler exists for this reason.
Handle [DONE] explicitly. Some upstream streams send a trailing data: [DONE] frame; others close the connection. The client above handles both by breaking on done from the reader.
Back-pressure. If the user closes the tab mid-stream, call reader.cancel() or tie the fetch to an AbortController. Otherwise you are billed for tokens nobody sees.
Token usage is on the final non-streamed response only. Streaming SSE frames do not include the usage block — we report usage separately via the dashboard. If you need per-request cost attribution at the client, issue a non-streaming call or fetch balance deltas after the conversation.

Where to take it

Add auth in front of the route — use NextAuth or your own session check before forwarding.
Persist conversations to a database after each assistant turn completes.
For multi-turn tool use, switch to a non-streaming path once tool_calls appears, resolve locally, and stream the final message back.