The agentic design stack for non-designers

This is for marketers. For non-designers. For people — like me — who don’t think visually first and don’t want to tweak layouts on a canvas in Figma, Photoshop, Pencil.dev, or Paper.design. People who would rather describe what they want and let AI build it.

In a companion piece, 5 ways to create differentiated design in 2026, I covered five techniques that make a marketing site stand out. The common thread: they all require stepping outside Figma and into code. This article covers the AI tooling that makes that step possible — without a designer and without a front-end developer.

Open Table of contents

Three layers, not one tool
Layer 1: Models that generate code
Layer 2: Models that generate images
Layer 3: Skills that improve output quality
The practical workflow
The same stack handles animation and video
Taste remains the bottleneck

Three layers, not one tool

The mistake most people make is thinking this is about picking the right AI model. It is not. The stack that produces professional-quality design has three distinct layers, and understanding what each does is the difference between generic output and something that looks good.

Models that generate code — turn your description into working HTML, CSS, and JavaScript
Models that generate images — create the visual assets that go on the page
Skills that improve output quality — instruction sets that give models design taste

Each layer solves a different problem. You need all three working together.

Layer 1: Models that generate code

This is what Andrej Karpathy named “vibe coding” when he coined the term in early 2025:

“There’s a new kind of coding I call ‘vibe coding’, where you fully give in to the vibes, embrace exponentials, and forget that the code even exists.” — Andrej Karpathy

The three models that matter for generating front-end code are Claude (via Claude Code or Cursor), Gemini 3.1 Pro (via Google AI Studio), and GPT (via ChatGPT or the API). All three can produce production-ready HTML, CSS, and JavaScript from a text description.

When people say “I used Gemini 3.1 Pro for design,” they mean generating working front-end code, not generating images. Gemini 3.1 Pro is a reasoning model. The image generation models in Google’s family are separate products (see Layer 2).

Layer 2: Models that generate images

This layer handles the visual assets: hero backgrounds, textures, product mockups, and atmospheric imagery.

The rankings have changed fast. Based on the LM Arena text-to-image leaderboard — a blind benchmark where humans compare outputs without knowing which model produced them — here is the current top tier:

Model	ELO	Best for
Nano Banana 2 (Google)	1280	Fast iteration, near-Pro quality at 3–5x the speed
GPT Image 1.5 (OpenAI)	1248	Text-heavy designs, complex compositions
Nano Banana Pro (Google)	1238	Studio-quality precision, detailed scenes
Flux 2 Max (Black Forest Labs)	1168	Open-source flexibility, photorealism
Recraft V4	1098	Native SVG vector output, design systems

A few things worth noting. Midjourney V7 does not appear on the leaderboard — it operates as a closed ecosystem with no public API — but still produces some of the most aesthetically refined output for creative work.

Vector graphics is becoming its own category. Recraft V4 generates native SVG — useful for icons, design system assets, and brand work. Quiver.ai, backed by a16z and built by the researchers behind StarVector, launched in February 2026 with a dedicated model (Arrow-1.0) for generating production-ready SVG logos and illustrations from text. Both offer API access, so they fit the same describe-and-generate workflow.

For marketing use cases, the pattern is multi-model: use GPT Image 1.5 or Nano Banana for general imagery, Midjourney for hero visuals where aesthetics matter most, and Recraft or Quiver for anything that needs to be a vector.

Layer 3: Skills that improve output quality

This is the least understood layer and the one that matters most. Without it, every AI-generated website looks the same — the same Inter font, the same purple gradient, the same rounded card grid.

Paul Bakaus, creator of Impeccable and formerly a Senior Developer Advocate at Google, puts it directly:

“Great design prompts require design vocabulary. Most people don’t have it.” — Paul Bakaus

Skills are instruction sets that teach code-generation models how to make better design decisions. They encode knowledge about typography, colour, spacing, motion, and interaction patterns — the vocabulary that designers spend years developing. As Bakaus describes the problem: every LLM learned from the same generic templates, and without guidance, you get the same predictable mistakes.

The three skills worth knowing about:

Anthropic’s frontend-design skill is built into Claude Code. It tells the model to commit to a bold aesthetic direction before writing code, avoid generic fonts, develop cohesive colour hierarchies, and use motion for polish. It is the baseline.

Impeccable by Paul Bakaus is a drop-in upgrade that adds 17 design commands and a library of anti-patterns — explicit instructions on what to avoid (grey text on coloured backgrounds, overused fonts, nested cards). It works with Claude Code, Cursor, and Gemini CLI.

Google Stitch Skills connects AI coding agents to Google Stitch, Google’s AI-powered UI design tool. It can generate React component systems, create multi-page websites from a single prompt, and produce walkthrough videos from designs.

The skills layer is what separates “I built this with AI” output that looks generic from output that looks professionally designed.

The practical workflow

For a concrete example: say you want to rebuild your marketing homepage to the standard described in the companion article.

Start with skills. Install Anthropic’s frontend-design skill and Impeccable. This sets the design baseline before you write a single prompt.
Describe with specificity. Instead of “build me a hero section,” describe the aesthetic: “a hero section with high contrast, confident display typography, dark-on-light with one sharp accent colour, and a testimonial ticker beneath the CTA.” The more specific your direction, the better the output.
Generate images separately. Use GPT Image 1.5 or Nano Banana for any visual assets — hero backgrounds, atmospheric textures, product mockups. Feed these into your code as static assets.
Iterate with the model. Review the output, identify what is off, and direct the next iteration. “The spacing between the feature cards is too tight. Add more vertical breathing room and make the hover effect more subtle.”

This is how I built the Growth Method site and a client project — both Astro.js, both without a designer. The first section took the longest as I dialled in the aesthetic direction. After that, each new section was faster because the model had the design system to work from.

The whole process for a single page section takes 15–30 minutes. A full marketing site with 6–8 distinct section patterns can be built in a focused week.

The same stack handles animation and video

The three layers above were framed around static design — layouts, typography, images. But the same stack handles animation, motion, and video.

Layer 1 generates the animation code. Claude, Gemini 3.1 Pro, and GPT already generate animation code fluently — SVG stroke animations, CSS keyframes, scroll-triggered reveals, parallax effects, page transitions. Describe a flow diagram that starts as a simple three-column layout and expands with detail cards on scroll. The model writes the Intersection Observer, the CSS transitions, and the connecting line animations. No animation library required for simple cases. For complex timelines, it generates GSAP or Framer Motion code just as easily.

Animated SVGs are a particular strength. Since SVG is just code, it fits the describe-and-generate workflow naturally. Self-drawing line effects, icon morphing, pulsing indicators, orbit animations — all producible from a text prompt with no external tools.

Layer 2 expands from images to video. When you need rendered video rather than code-based animation — a product demo, an explainer clip, a social media ad — the image generation models have video counterparts. Veo 3.1 (Google), Runway Gen-4, Sora 2 (OpenAI), and Kling all offer API access. You describe the scene in text, the model renders the video. Same principle as image generation, just moving.

Layer 3 adds animation-specific skills. The most notable is Remotion — a framework for creating videos programmatically with React, with a Claude Code skill that has over 83,000 installs. You describe a video in plain English, Claude generates the React composition, and Remotion renders it to MP4. Creators report generating 20-second promo videos in under 20 minutes.

Other animation skills worth noting: GSAP skills that provide scroll animation patterns and timeline context, the freshtechbro animation bundles covering GSAP, Framer Motion, React Spring, and Lottie, and a LottieFiles MCP server that lets Claude search and embed existing Lottie animations by description.

Taste remains the bottleneck

The agentic design stack makes production accessible. But accessible production does not mean everyone will produce great work. The models can execute any aesthetic direction — the question is whether the direction is good.

This is the same dynamic described in the companion article: AI lowers the barrier, taste remains the moat. The stack gives you leverage, but the leverage only compounds if you have a clear point of view on what good design looks like for your brand.

The companies that will stand out are not the ones with the best tools. They are the ones with the clearest aesthetic direction and the willingness to push past the AI defaults.