CanvasX

CanvasX: AI-Powered Mobile UI/UX Design Tool

Overview

CanvasX is a full-stack AI application that transforms text prompts into Dribbble-quality, production-ready mobile app screens in under 90 seconds. Built with Next.js 16 and powered by Google Gemini 2.5 Flash, it uses a two-phase AI generation pipeline — first planning the screen architecture, then generating pixel-perfect HTML/CSS — all streamed to the user in real-time via Inngest. Users describe the app they want, pick from 22 curated design themes, and watch their screens materialize live inside interactive iPhone device frames on a zoomable canvas.

What Users Can Do

Text-to-UI Generation: Describe any mobile app idea in natural language and get 2–3 production-ready screens generated automatically
22 Built-in Themes: Choose from curated themes like Ocean Breeze, Netflix Dark, Acid Lime, Neo-Brutalism, Glassmorphism, Cyber, Midnight, and more
Interactive Canvas: Zoom, pan, and pinch to navigate generated screens rendered inside realistic iPhone device frames
Real-time Streaming: Watch screens generate live with progress stages — analyzing, planning, generating, completing
Iterative Design: Add more screens to existing projects with context-aware generation that maintains design consistency
Theme Switching: Instantly switch between all 22 themes on the canvas to preview different visual styles
PNG Export: Download high-quality screenshots of individual screens via server-side Puppeteer rendering
HTML Code View: Inspect and copy the raw HTML/CSS code behind any generated screen
Project Management: Create, browse, and manage multiple design projects from a unified dashboard
Subscription Plans: Free (2 projects, 10 generations/month), Pro ($6/month), and Unlimited ($20/month) tiers

Why I Built This

I built CanvasX to solve real pain points in the early-stage mobile app design process:

Design Bottleneck: Turning an app idea into visual mockups traditionally requires hours of manual design work or expensive tools
Tool Complexity: Tools like Figma have a steep learning curve for developers who just want to visualize their ideas quickly
Inconsistency: Manually designing multiple screens often leads to inconsistent styling, spacing, and component patterns
Theme Exploration: Trying different visual styles on a design is time-consuming — you have to manually restyle everything
No AI-Native Design Tool: Existing AI tools generate single static images, not structured, themeable, production-ready HTML/CSS screens

Tech Stack

Frontend

Next.js 16 (App Router + Turbopack): React framework for both UI and API routes
TypeScript: End-to-end type safety with Prisma-generated types
Tailwind CSS 4: Utility-first styling for the application UI
Shadcn UI: Accessible component primitives (Radix UI)
Motion (Framer Motion): Page transitions and micro-animations
React Zoom Pan Pinch: Interactive canvas with zoom, pan, and pinch gestures
React Resizable Panels: Adjustable layout panels in the editor
TanStack Query: Server state management with automatic cache invalidation
React Context API: Client-side state for canvas, frames, themes, and generation status

Backend & Services

Google Gemini 2.5 Flash: AI model for screen analysis/planning and HTML generation
Vercel AI SDK: Unified interface for AI model calls with structured output and tool use
Unsplash API: AI tool integration — Gemini calls searchUnsplash to find real images during generation
NeonDB (Serverless PostgreSQL): Primary database for users, projects, frames, and subscriptions
Prisma ORM: Type-safe database client with migrations
Inngest: Durable background job execution with realtime event streaming
NextAuth v5: Authentication with Google OAuth and credentials provider
Puppeteer / Puppeteer-Core: Server-side headless browser for PNG screenshot export
Upstash Redis: Response caching and API rate limiting
Polar.sh: Payment processing and subscription lifecycle management
DOMPurify + JSDOM: Server-side HTML sanitization for AI-generated content

Key Features

Two-Phase AI Generation Pipeline

The core innovation of CanvasX is splitting screen generation into two distinct AI phases, each optimized for its role:

Phase 1 — Analysis & Planning: The user's prompt is sent to Gemini with the ANALYSIS_PROMPT system instruction. Gemini returns a structured JSON object (validated via Zod) containing:
- theme: Best-matching theme ID from 22 options
- screens[]: Array of 2–3 screen specs, each with id, name, purpose, and a highly detailed visualDescription covering exact layout, chart types, icon names, data values, and bottom navigation configuration
Phase 2 — HTML Generation: For each planned screen, a second Gemini call generates self-contained HTML/CSS using:
- Tailwind v3 utility classes for layout and styling
- CSS custom properties (var(--primary), var(--background), etc.) for theme colors
- SVG-only charts (area, line, circular progress, donut — never canvas or divs)
- Iconify icons (lucide:* set) for all iconography
- Real images via searchUnsplash tool calling
- Realistic placeholder data ("8,432 steps", "$12.99", "7h 20m" — not generic text)

22-Theme CSS Variable Architecture

Each theme defines ~20 CSS custom properties: --background, --foreground, --card, --primary, --accent, --muted, --border, --chart-1 through --chart-5, and more
Themes range from light (Ocean Breeze, Swiss Style, Peach) to dark (Netflix, Acid Lime, Cyber, Midnight, Neon) to special effects (Glassmorphism, Neo-Brutalism)
Base variables provide shared typography (--font-sans, --font-heading, --font-serif, --font-mono) and shadow scales
Theme CSS is injected into each frame's wrapper HTML, making generated screens instantly re-themeable without regeneration
The AI is instructed to use CSS variables for all foundational colors, ensuring perfect theme adherence

Real-time Generation Streaming

Inngest's @inngest/realtime middleware publishes live progress events on a per-user channel (user:{userId})
Six event stages flow to the frontend: generation.start → analysis.start → analysis.complete (skeleton frames appear with loading states) → frame.created (each screen renders as it completes) → generation.complete
The RealtimeProvider context subscribes to events, updates frame state, and manages loading/error UI
Fallback timeout (60s) catches cases where the backend never responds; generation timeout (5 min) catches stuck jobs
Toast notifications provide real-time feedback for completion and error states

Context-Aware Iterative Generation

When adding screens to an existing project, the system sends the last 4 frames' HTML as CONTEXT HTML to the analysis prompt
This ensures new screens maintain visual consistency: matching bottom navigation, consistent component styles, coherent color usage, and aligned spacing
The existing theme is preserved — the AI doesn't re-select a theme for iterative generations

Interactive Canvas & Device Frames

Generated HTML is rendered inside realistic iPhone device frames using iframe sandboxing
react-zoom-pan-pinch provides smooth canvas navigation with zoom controls
Floating toolbar offers actions: theme switching, HTML code view, screenshot export
Each frame has its own toolbar for individual actions (download, view code, delete)
Canvas supports multiple frames laid out in a responsive grid

Server-Side Screenshot Export

PNG export uses Puppeteer (full) in development and @sparticuz/chromium-min + Puppeteer-Core in production (for serverless environments)
The frame's HTML is wrapped in a complete HTML document with theme CSS variables, viewport meta tags, and Google Fonts
Screenshots are captured at iPhone dimensions with proper device pixel ratio for high-quality output

AI-Powered Project Naming

When a user submits a prompt, a separate Gemini call generates a concise project name (under 5 words) based on the prompt content
This runs as a server action before project creation, giving every project a meaningful name automatically

Technical Implementation

Generation Flow

User types a prompt on the home page (e.g., "Fitness tracker app with dark theme")
Server action calls Gemini to generate a short project name
Project record is created in NeonDB via Prisma
Inngest event ui/generate.screen is triggered with prompt, projectId, userId, and existing frames (if any)
User is navigated to /project/[id] where the canvas UI subscribes to realtime events
Inngest Step 1 — analyze-and-plan-screen: Gemini analyzes prompt → returns JSON plan with theme + screen specs
Project's theme is updated in DB; frontend receives skeleton frames
Inngest Step 2 — generated-screen-{i} (per screen): Gemini generates full HTML/CSS with Unsplash tool calling
Each generated frame is saved to DB, published to frontend, and rendered in a device frame
Redis cache is invalidated for the project; generation marked complete

Database Schema

User: Authentication user with email, password (hashed via bcryptjs), OAuth accounts, and subscription reference
Account: OAuth provider accounts (Google) linked to users
Project: Design project with name, selected theme ID, thumbnail URL, and associated user
Frame: Individual screen within a project — stores the title and raw HTML content
Subscription: Polar.sh subscription tracking with plan ID, status, billing period, and cancellation state

Technical Challenges & Solutions

Challenge 1: Consistent AI-Generated UI Quality

Problem: LLMs tend to generate generic, Bootstrap-like HTML that doesn't look premium or design-forward
Solution: Crafted an extensive system prompt (GENERATION_SYSTEM_PROMPT) that enforces Dribbble-quality standards — glassmorphism, soft glows, generous rounding, layered cards, floating navigation, gradient accents, and z-index layering. Included concrete HTML examples for SVG charts (area, circular progress, donut) so the AI produces consistent, visually stunning output

Challenge 2: Theme Consistency Across Screens

Problem: Generating multiple screens independently often leads to inconsistent styling — different fonts, spacing, nav patterns, and color usage
Solution: The two-phase pipeline solves this — Phase 1 creates a unified plan with explicit bottom navigation specs (icon names, active states, styling) that every screen must follow. The visualDescription field is hyper-specific, including exact Tailwind classes, icon names, and layout rules. For iterative generation, existing HTML is injected as context

Challenge 3: Real Images Without Hallucination

Problem: AI models hallucinate image URLs that return 404s, breaking the visual output
Solution: Integrated Unsplash as a Vercel AI SDK tool (searchUnsplash). During generation, Gemini calls the tool with a search query, and the tool returns real Unsplash image URLs. Avatars use pravatar.cc with deterministic user IDs. The system prompt explicitly prohibits hallucinated image URLs

Challenge 4: Real-time Progress Without WebSockets

Problem: Screen generation takes 30–90 seconds; users need live feedback, but setting up WebSocket infrastructure is complex
Solution: Leveraged Inngest's built-in realtime pub/sub. The background function publish()es events at each stage; the frontend subscribes via @inngest/realtime/hooks. No WebSocket server, no polling — just event-driven streaming with automatic reconnection

Challenge 5: Screenshot Export in Serverless

Problem: Puppeteer requires a full Chromium binary (~280MB) which exceeds serverless function size limits
Solution: Used @sparticuz/chromium-min for production (a stripped-down Chromium binary optimized for Lambda/serverless) paired with puppeteer-core, while keeping full puppeteer for local development. The HTML is wrapped with all required CSS variables, fonts, and viewport settings before capture

Challenge 6: Subscription-Gated Feature Enforcement

Problem: Need to enforce different limits (projects, screens per project, generations per month, available themes) across three plan tiers without duplicating logic
Solution: Centralized plan definitions in constant/plans.ts with a getUserPlan() function that resolves the active plan from the Subscription table — handling active, canceled-but-within-period, and expired states. Helper functions canCreateProject() and canGenerateScreen() are called before any create/generate action

After Launch & Impact

Built a complete AI-powered SaaS with a novel two-phase generation pipeline that produces consistently high-quality mobile UI
Designed and implemented 22 production-ready CSS variable themes covering light, dark, neon, glassmorphism, and brutalist aesthetics
Integrated seven third-party services (NextAuth, Gemini, Inngest, Prisma/Neon, Polar.sh, Upstash Redis, Unsplash) into a cohesive architecture
Implemented real-time event streaming for live generation progress without WebSocket infrastructure
Built an interactive canvas with zoom/pan/pinch, device frame rendering, and floating toolbars
Achieved server-side PNG export using Puppeteer with serverless-optimized Chromium

Future Plans

Add collaborative editing with shared project links
Support additional device frames (Android, iPad, Apple Watch)
Implement screen-to-code export (React Native, Flutter, SwiftUI)
Add AI-powered design iteration ("make the header more minimal", "change the chart to a bar chart")
Build a public template gallery of community-generated designs
Add version history and undo/redo for generated screens
Implement A/B testing view to compare theme variations side-by-side
Add Figma plugin for exporting designs directly to Figma
Support custom theme creation with a visual theme editor

Technology Stack

Key Challenges

Key Learnings