
CanvasX
AI-powered mobile UI/UX design tool that generates beautiful, production-ready mobile app screens from text prompts in minutes — powered by Google Gemini AI and real-time streaming.
Technology Stack
Key Challenges
- Two-Phase AI Generation Pipeline
- Real-time Streaming with Inngest
- 22-Theme CSS Variable System
- AI Prompt Engineering for Pixel-Perfect HTML
- Interactive Canvas with Zoom/Pan/Pinch
- Server-Side Screenshot Export via Puppeteer
- Subscription-Gated Feature Access
- Rate Limiting & Redis Caching
- Iterative Screen Generation with Context
Key Learnings
- Google Gemini AI Integration (Structured Output + Tool Use)
- Inngest Background Jobs & Realtime Streaming
- Advanced AI Prompt Engineering for UI Generation
- CSS Variable-Based Theming Architecture
- Prisma ORM with NeonDB
- Polar.sh Payment & Subscription Integration
- NextAuth v5 (Google OAuth + Credentials)
- Puppeteer Server-Side Rendering
- Upstash Redis Caching & Rate Limiting
- Vercel AI SDK with Tool Calling
CanvasX: AI-Powered Mobile UI/UX Design Tool
Overview
CanvasX is a full-stack AI application that transforms text prompts into Dribbble-quality, production-ready mobile app screens in under 90 seconds. Built with Next.js 16 and powered by Google Gemini 2.5 Flash, it uses a two-phase AI generation pipeline — first planning the screen architecture, then generating pixel-perfect HTML/CSS — all streamed to the user in real-time via Inngest. Users describe the app they want, pick from 22 curated design themes, and watch their screens materialize live inside interactive iPhone device frames on a zoomable canvas.
What Users Can Do
- Text-to-UI Generation: Describe any mobile app idea in natural language and get 2–3 production-ready screens generated automatically
- 22 Built-in Themes: Choose from curated themes like Ocean Breeze, Netflix Dark, Acid Lime, Neo-Brutalism, Glassmorphism, Cyber, Midnight, and more
- Interactive Canvas: Zoom, pan, and pinch to navigate generated screens rendered inside realistic iPhone device frames
- Real-time Streaming: Watch screens generate live with progress stages — analyzing, planning, generating, completing
- Iterative Design: Add more screens to existing projects with context-aware generation that maintains design consistency
- Theme Switching: Instantly switch between all 22 themes on the canvas to preview different visual styles
- PNG Export: Download high-quality screenshots of individual screens via server-side Puppeteer rendering
- HTML Code View: Inspect and copy the raw HTML/CSS code behind any generated screen
- Project Management: Create, browse, and manage multiple design projects from a unified dashboard
- Subscription Plans: Free (2 projects, 10 generations/month), Pro ($6/month), and Unlimited ($20/month) tiers
Why I Built This
I built CanvasX to solve real pain points in the early-stage mobile app design process:
- Design Bottleneck: Turning an app idea into visual mockups traditionally requires hours of manual design work or expensive tools
- Tool Complexity: Tools like Figma have a steep learning curve for developers who just want to visualize their ideas quickly
- Inconsistency: Manually designing multiple screens often leads to inconsistent styling, spacing, and component patterns
- Theme Exploration: Trying different visual styles on a design is time-consuming — you have to manually restyle everything
- No AI-Native Design Tool: Existing AI tools generate single static images, not structured, themeable, production-ready HTML/CSS screens
Tech Stack
Frontend
- Next.js 16 (App Router + Turbopack): React framework for both UI and API routes
- TypeScript: End-to-end type safety with Prisma-generated types
- Tailwind CSS 4: Utility-first styling for the application UI
- Shadcn UI: Accessible component primitives (Radix UI)
- Motion (Framer Motion): Page transitions and micro-animations
- React Zoom Pan Pinch: Interactive canvas with zoom, pan, and pinch gestures
- React Resizable Panels: Adjustable layout panels in the editor
- TanStack Query: Server state management with automatic cache invalidation
- React Context API: Client-side state for canvas, frames, themes, and generation status
Backend & Services
- Google Gemini 2.5 Flash: AI model for screen analysis/planning and HTML generation
- Vercel AI SDK: Unified interface for AI model calls with structured output and tool use
- Unsplash API: AI tool integration — Gemini calls
searchUnsplashto find real images during generation - NeonDB (Serverless PostgreSQL): Primary database for users, projects, frames, and subscriptions
- Prisma ORM: Type-safe database client with migrations
- Inngest: Durable background job execution with realtime event streaming
- NextAuth v5: Authentication with Google OAuth and credentials provider
- Puppeteer / Puppeteer-Core: Server-side headless browser for PNG screenshot export
- Upstash Redis: Response caching and API rate limiting
- Polar.sh: Payment processing and subscription lifecycle management
- DOMPurify + JSDOM: Server-side HTML sanitization for AI-generated content
Key Features
Two-Phase AI Generation Pipeline
The core innovation of CanvasX is splitting screen generation into two distinct AI phases, each optimized for its role:
- Phase 1 — Analysis & Planning: The user's prompt is sent to Gemini with the
ANALYSIS_PROMPTsystem instruction. Gemini returns a structured JSON object (validated via Zod) containing:theme: Best-matching theme ID from 22 optionsscreens[]: Array of 2–3 screen specs, each withid,name,purpose, and a highly detailedvisualDescriptioncovering exact layout, chart types, icon names, data values, and bottom navigation configuration
- Phase 2 — HTML Generation: For each planned screen, a second Gemini call generates self-contained HTML/CSS using:
- Tailwind v3 utility classes for layout and styling
- CSS custom properties (
var(--primary),var(--background), etc.) for theme colors - SVG-only charts (area, line, circular progress, donut — never canvas or divs)
- Iconify icons (
lucide:*set) for all iconography - Real images via
searchUnsplashtool calling - Realistic placeholder data ("8,432 steps", "$12.99", "7h 20m" — not generic text)
22-Theme CSS Variable Architecture
- Each theme defines ~20 CSS custom properties:
--background,--foreground,--card,--primary,--accent,--muted,--border,--chart-1through--chart-5, and more - Themes range from light (Ocean Breeze, Swiss Style, Peach) to dark (Netflix, Acid Lime, Cyber, Midnight, Neon) to special effects (Glassmorphism, Neo-Brutalism)
- Base variables provide shared typography (
--font-sans,--font-heading,--font-serif,--font-mono) and shadow scales - Theme CSS is injected into each frame's wrapper HTML, making generated screens instantly re-themeable without regeneration
- The AI is instructed to use CSS variables for all foundational colors, ensuring perfect theme adherence
Real-time Generation Streaming
- Inngest's
@inngest/realtimemiddleware publishes live progress events on a per-user channel (user:{userId}) - Six event stages flow to the frontend:
generation.start→analysis.start→analysis.complete(skeleton frames appear with loading states) →frame.created(each screen renders as it completes) →generation.complete - The
RealtimeProvidercontext subscribes to events, updates frame state, and manages loading/error UI - Fallback timeout (60s) catches cases where the backend never responds; generation timeout (5 min) catches stuck jobs
- Toast notifications provide real-time feedback for completion and error states
Context-Aware Iterative Generation
- When adding screens to an existing project, the system sends the last 4 frames' HTML as
CONTEXT HTMLto the analysis prompt - This ensures new screens maintain visual consistency: matching bottom navigation, consistent component styles, coherent color usage, and aligned spacing
- The existing theme is preserved — the AI doesn't re-select a theme for iterative generations
Interactive Canvas & Device Frames
- Generated HTML is rendered inside realistic iPhone device frames using iframe sandboxing
react-zoom-pan-pinchprovides smooth canvas navigation with zoom controls- Floating toolbar offers actions: theme switching, HTML code view, screenshot export
- Each frame has its own toolbar for individual actions (download, view code, delete)
- Canvas supports multiple frames laid out in a responsive grid
Server-Side Screenshot Export
- PNG export uses Puppeteer (full) in development and
@sparticuz/chromium-min+ Puppeteer-Core in production (for serverless environments) - The frame's HTML is wrapped in a complete HTML document with theme CSS variables, viewport meta tags, and Google Fonts
- Screenshots are captured at iPhone dimensions with proper device pixel ratio for high-quality output
AI-Powered Project Naming
- When a user submits a prompt, a separate Gemini call generates a concise project name (under 5 words) based on the prompt content
- This runs as a server action before project creation, giving every project a meaningful name automatically
Technical Implementation
Generation Flow
- User types a prompt on the home page (e.g., "Fitness tracker app with dark theme")
- Server action calls Gemini to generate a short project name
- Project record is created in NeonDB via Prisma
- Inngest event
ui/generate.screenis triggered with prompt, projectId, userId, and existing frames (if any) - User is navigated to
/project/[id]where the canvas UI subscribes to realtime events - Inngest Step 1 —
analyze-and-plan-screen: Gemini analyzes prompt → returns JSON plan with theme + screen specs - Project's theme is updated in DB; frontend receives skeleton frames
- Inngest Step 2 —
generated-screen-{i}(per screen): Gemini generates full HTML/CSS with Unsplash tool calling - Each generated frame is saved to DB, published to frontend, and rendered in a device frame
- Redis cache is invalidated for the project; generation marked complete
Database Schema
- User: Authentication user with email, password (hashed via bcryptjs), OAuth accounts, and subscription reference
- Account: OAuth provider accounts (Google) linked to users
- Project: Design project with name, selected theme ID, thumbnail URL, and associated user
- Frame: Individual screen within a project — stores the title and raw HTML content
- Subscription: Polar.sh subscription tracking with plan ID, status, billing period, and cancellation state
Technical Challenges & Solutions
Challenge 1: Consistent AI-Generated UI Quality
- Problem: LLMs tend to generate generic, Bootstrap-like HTML that doesn't look premium or design-forward
- Solution: Crafted an extensive system prompt (
GENERATION_SYSTEM_PROMPT) that enforces Dribbble-quality standards — glassmorphism, soft glows, generous rounding, layered cards, floating navigation, gradient accents, and z-index layering. Included concrete HTML examples for SVG charts (area, circular progress, donut) so the AI produces consistent, visually stunning output
Challenge 2: Theme Consistency Across Screens
- Problem: Generating multiple screens independently often leads to inconsistent styling — different fonts, spacing, nav patterns, and color usage
- Solution: The two-phase pipeline solves this — Phase 1 creates a unified plan with explicit bottom navigation specs (icon names, active states, styling) that every screen must follow. The
visualDescriptionfield is hyper-specific, including exact Tailwind classes, icon names, and layout rules. For iterative generation, existing HTML is injected as context
Challenge 3: Real Images Without Hallucination
- Problem: AI models hallucinate image URLs that return 404s, breaking the visual output
- Solution: Integrated Unsplash as a Vercel AI SDK tool (
searchUnsplash). During generation, Gemini calls the tool with a search query, and the tool returns real Unsplash image URLs. Avatars use pravatar.cc with deterministic user IDs. The system prompt explicitly prohibits hallucinated image URLs
Challenge 4: Real-time Progress Without WebSockets
- Problem: Screen generation takes 30–90 seconds; users need live feedback, but setting up WebSocket infrastructure is complex
- Solution: Leveraged Inngest's built-in realtime pub/sub. The background function
publish()es events at each stage; the frontend subscribes via@inngest/realtime/hooks. No WebSocket server, no polling — just event-driven streaming with automatic reconnection
Challenge 5: Screenshot Export in Serverless
- Problem: Puppeteer requires a full Chromium binary (~280MB) which exceeds serverless function size limits
- Solution: Used
@sparticuz/chromium-minfor production (a stripped-down Chromium binary optimized for Lambda/serverless) paired withpuppeteer-core, while keeping fullpuppeteerfor local development. The HTML is wrapped with all required CSS variables, fonts, and viewport settings before capture
Challenge 6: Subscription-Gated Feature Enforcement
- Problem: Need to enforce different limits (projects, screens per project, generations per month, available themes) across three plan tiers without duplicating logic
- Solution: Centralized plan definitions in
constant/plans.tswith agetUserPlan()function that resolves the active plan from the Subscription table — handling active, canceled-but-within-period, and expired states. Helper functionscanCreateProject()andcanGenerateScreen()are called before any create/generate action
After Launch & Impact
- Built a complete AI-powered SaaS with a novel two-phase generation pipeline that produces consistently high-quality mobile UI
- Designed and implemented 22 production-ready CSS variable themes covering light, dark, neon, glassmorphism, and brutalist aesthetics
- Integrated seven third-party services (NextAuth, Gemini, Inngest, Prisma/Neon, Polar.sh, Upstash Redis, Unsplash) into a cohesive architecture
- Implemented real-time event streaming for live generation progress without WebSocket infrastructure
- Built an interactive canvas with zoom/pan/pinch, device frame rendering, and floating toolbars
- Achieved server-side PNG export using Puppeteer with serverless-optimized Chromium
Future Plans
- Add collaborative editing with shared project links
- Support additional device frames (Android, iPad, Apple Watch)
- Implement screen-to-code export (React Native, Flutter, SwiftUI)
- Add AI-powered design iteration ("make the header more minimal", "change the chart to a bar chart")
- Build a public template gallery of community-generated designs
- Add version history and undo/redo for generated screens
- Implement A/B testing view to compare theme variations side-by-side
- Add Figma plugin for exporting designs directly to Figma
- Support custom theme creation with a visual theme editor