Loom Video
Project Description
generative-webview
Why we’re building this
Right now, if you want to access information on the internet, you have two options — and both are bad in opposite ways:
- Browse it yourself in a regular browser. Even the new “AI browsers” are dumb about this: they bolt a chat sidebar onto a static page that was designed for some imaginary average reader, not you. The page doesn’t change based on what you already know, what you’re trying to do, or where you came from.
- Hand it off to a chatbot. ChatGPT, Claude, and friends will happily go fetch a URL — but the answer comes back as a wall of text in a chat bubble. You lose every chart, every diagram, every comparison table, every visual hierarchy that made the source page worth reading in the first place.
generative-webview is the middle ground. It uses generative AI not to replace the web with chat, and not to bolt chat onto the web, but to regenerate the web itself around you. You paste a URL and the agent renders a real, interactive page — with components, charts, timelines, comparisons, hero imagery — but composed specifically for what you care about, what you already know, and the path you took to get there.
It is personalized browsing. Every page is rebuilt from scratch for the person reading it.
What it does
Paste any URL. An AI agent composes a bespoke, interactive reading interface in real time from a curated React component library. The interface itself is the answer — not a chat reply, not a summary panel.
- A Wikipedia article on the Roman Empire renders as a hero narrative, an interactive timeline, a population chart, and an East/West comparison table.
- A React Server Components blog post becomes code-forward sections, pull quotes, and concept callouts.
- Same engine, different composition — because the agent’s job is to choose the shape of the answer, not just the words.
You steer it live:
- Long-form preferences — “I know linear algebra but not differential geometry, prefer visual explanations”
- Tone — formal · balanced · experimental
- Density — sparse · comfortable · dense
- Visual richness — minimal · balanced · rich
- Per-component feedback — refine one block (“make this timeline span only the late Republic”) without throwing away the rest of the page
- Browsing memory — the agent remembers recent pages you’ve visited so it can compare, contrast, and link forward as you read
Beyond text-based chat
Most LLM products ship intelligence as text in a box. generative-webview inverts that: the model’s primary output is structure. The agent picks which components to render, in what order, with what data, and at what density. The frontend has no hardcoded layout — every block on screen is an agent-emitted CompositionBlock, and toggling a preference re-composes the page without a reload.
Originality
Existing generative-UI demos are scoped to one vertical — a finance dashboard, a travel form, a chart playground. generative-webview takes the most generic input imaginable — any URL — and emits a content-aware, reader-aware composition from a shared component vocabulary. The agent acts as a layout director, not a copywriter. The same engine works for an encyclopedia article, a technical blog post, or a news story.
Technical execution
End-to-end working code, runnable via docker compose up:
-
AG-UI protocol streams
RUN_STARTED/STATE_SNAPSHOT/RUN_FINISHEDevents over SSE from FastAPI directly to the browser — no Next.js proxy. -
CopilotKit hooks (
useCoAgent) wire shared agent state into a typed component registry:NarrativeSection,Timeline,ComparisonTable,DataViz,PullQuote,RelatedEntities,CtaCard,HeroImage. - A2UI primitives + Pydantic contracts define the durable schema. Invalid model output cannot reach the frontend.
- Gemini 2.5 Flash structured outputs (Vertex AI) compose the surface; Gemini 2.5 Flash Image (“Nano Banana”) generates layout-aware hero art. OpenAI is a drop-in fallback.
- Session memory keeps a per-reader profile and recent-pages context so the agent can personalize and cross-reference across a browsing session.
Stack
| Layer | Technology |
|---|---|
| Generative UI vocabulary | A2UI primitives + extended components |
| Agent transport | AG-UI over HTTP/SSE |
| Frontend runtime | CopilotKit + Next.js 15 + React 19 + Tailwind |
| Backend runtime |
FastAPI exposing CopilotKit-compatible /copilotkit routes |
| Constrained generation | Gemini 2.5 Flash + OpenAI structured outputs, validated by Pydantic |
| Image generation | Gemini 2.5 Flash Image (Vertex AI) |
| Charts | Recharts |
A2UI gives us a vocabulary, AG-UI gives us a transport, CopilotKit gives us the React surface area, and Pydantic-constrained models give us output the UI can trust. The result is a generative experience that takes the middle ground between a dumb browser and a chat box — and renders live, personalized, content-aware web pages from a single URL.
video at: https://www.loom.com/share/6cdc4eb0bbe244b8a09c7321b31eccaa