AI image generation for ads
The model is not the skill. The brief is. Today you learn to direct an image generator the way a creative director runs a shoot — so the output is on-brand, ad-ready, and built to feed the loop, not AI slop.
On-brand AI images are made by the brief, not the model. Six inputs you control — subject and product fidelity, style reference, composition, lighting, exact text, persona and angle — decide whether you get your ad or the internet's average.
1The model is a commodity; the brief is the craft
Yesterday (Day 12) you turned on Meta's free, in-platform enhancements — background generation, image expansion, touch-ups — that reformat and remix the assets you upload. Today you make the assets themselves from a blank canvas. That is a different job and a different risk profile, so it gets its own day.
Here is the trap to disarm first. Founders read about "the best AI image model" and treat tool selection as the decision. It barely matters. You mapped the stack on Day 11 — the one-line refresher for images is a clear division of labour, and you'll use several in one pipeline:
- Midjourney — the highest-aesthetic hero shot.
- The current FLUX model — photoreal product and lifestyle.
- Ideogram or Recraft's latest release — when there's heavy text or a logo.
- ByteDance Seedream — cheap bulk variants.
- Google's current Gemini image model — the mid-2026 default for edits, variants, and locking brand consistency across a series.
Pick by the job in front of you, not by the leaderboard. Names will change — the division of labour (aesthetic hero / photoreal / text rendering / cheap bulk / consistent edits) is the durable map.
The reason this is liberating: it means the durable skill is not "which model." It's how you brief. Two operators with identical access to the same model produce wildly different output — one ships generic stock-looking sludge, the other ships an ad that converts — because one wrote a brief and one wrote a wish. Every generic image you've ever scrolled past was a one-line prompt. Every on-brand one was a structured brief.
And remember why we're here. Back on Day 1 we established that creative is the last lever you own and that it fatigues — every winner decays as it scales. The point of generating images with AI is not to make one pretty picture; it's to feed the volume the loop needs (Day 5's explore/exploit, Day 10's matrix) without a studio budget per asset. Cheap volume only compounds if it's on-brand and tagged, so the brief is where quality and learnability both get protected.
2The six inputs that make an image on-brand
A wish is "a serum bottle on a marble counter, sunlight." A brief specifies six inputs the model cannot guess. Miss any one and the model fills the gap with its training-data average — which is exactly what "AI slop" is: the statistical mean of every image like yours that ever existed. Bland by construction.
The six inputs:
- Subject & product fidelity — what exactly is in frame, and is it the real product. This is the non-negotiable one (Section 3).
- Style reference — the look: a brand palette, a mood, 2–4 reference images of past on-brand creative. Midjourney's --sref and Style Weight, or feeding current Gemini image model your logo, brand colours and prior visuals, exist precisely to pin this down.
- Composition & format — framing, where the product sits, negative space for copy, and the aspect ratio you'll ship into (9:16 Reels, 4:5 Feed, 1:1 — your Day 10 formats).
- Lighting & treatment — and here you make a deliberate Day 9 choice: hi-fi (studio, polished, controlled light) or lo-fi (phone-shot, on-a-kitchen-counter, "a friend posted this"). AI does both; the brief decides which.
- Text-in-image — if there's a headline or badge baked into the pixels, you specify the exact words. In 2026 the text-rendering specialists (Ideogram, Recraft) and current Gemini image model render legible, correct text; older models still produce garbled glyphs. Route text-heavy work to a model that can spell.
- Persona & angle cues — the human the image implies (Day 7) and the message it carries (Day 8). A "budget-conscious parent, before/after relief" frame looks nothing like a "status-seeking professional, aspirational" frame, even for the same product.
A worked example. Say a skincare brand wants Feed creative for one concept across two personas. The wish — "serum on marble, sunlight" — gives you a forgettable stock photo on the first try and three near-identical re-rolls after. The brief gives you a generation matrix. Hold concept, product and palette fixed; vary persona, treatment and format: 2 personas × 2 treatments (hi-fi / lo-fi) × 2 ratios (4:5 / 9:16) = 8 tagged frames from one structured brief in an afternoon. At a studio shoot that's a half-day and four figures; here it's a few euros of credits. That is the leverage — but only because every frame is a deliberate, labelled point in the matrix, not eight rolls of the same dice.
3Product fidelity: composite, don't hallucinate
Here is the single most expensive mistake in AI ad imagery, and the one rule that separates amateur output from professional: never let the model invent your product.
If you describe your product in words and let the generator draw it, you will get a bottle, a sneaker, a dashboard — plausible, beautiful, and wrong. The label is gibberish, the logo is a melted approximation, the cap is the wrong shape, the device has six buttons instead of four. To you it's obviously not your product. To a customer who clicks through and sees the real thing, it's a bait-and-switch — and trust, the thing your whole funnel runs on, takes the hit. For regulated or premium categories it's worse: a hallucinated product can imply a feature you don't sell.
The fix is compositing. Generate the scene with AI — the marble, the light, the lifestyle context, the model's hands — and place your actual product photograph into it. Modern editors make this a single step: you inpaint the real product onto an AI-generated background (inpainting = regenerating only a selected area, so the rest of the image is untouched; the mask is that selection), or mask it in and let current Gemini image model colour-match the lighting so it sits naturally. The AI does what it's genuinely good at (cheap, infinite, plausible environments); the camera does what it must (an honest record of the thing you'll ship). Same logic protects faces and any text the customer must trust — generate the world, anchor the truth.
This same discipline scales hi-fi and lo-fi (Day 9) equally. A hi-fi composite drops the real bottle into a studio-lit AI set. A lo-fi composite drops the same real bottle into an AI-generated "messy bathroom shelf, phone flash, slightly off-centre" — and now you have a native-looking UGC still that didn't need a creator, a kitchen, or a half-day. Both are on-brand because in both the product is real.
The generator is a world-class photographer who has shot everything and will shoot anything — but is blind to your brand and has never seen your product. Hand them a one-line wish ("nice serum photo") and they shoot the generic stock image they've shot a thousand times. Hand them a brief — mood boards, the palette, the framing, the lighting, and the actual product on the table — and they shoot your ad. You're not pressing a button. You're on set, directing. The prompt is just how you talk to the crew.
This isn't a one-off prompt you write and lose. It's a saved template in your creative doc — a row per generation, six input columns, so anyone (or any future you) can produce on-brand frames without re-deriving the recipe. It plugs straight into the genome tags from Day 4: the brief's persona/angle/treatment/format fields are the genome axes, captured at the moment of creation rather than guessed at later.
Note the QA gate. Know how disclosure actually works before you assume it's on you. For ordinary product ads, Meta does not require you to self-flag AI — when its detection spots media created or significantly edited with generative AI, Meta automatically applies an "AI info" label, with no action needed from you (minor edits like resizing or colour correction aren't labelled at all). The mandatory advertiser-disclosure checkbox shows up only for ads registered in Meta's Social Issues, Elections or Politics category, and only when they contain photorealistic, deepfake-style created-or-altered media — there, failing to disclose can mean removal or account penalties. Separately, the EU AI Act's Article 50 transparency duties take effect 2 August 2026 and bind generative-AI providers (mark outputs) and deployers (disclose deepfakes and public-interest AI text) — so an EU advertiser shipping realistic synthetic or deepfake creative may have a labelling duty under EU law, though it's not a blanket "label every AI-touched ad" rule. Practical gate: if a frame contains a realistic synthetic person, deepfake-style media, or runs in the politics/social-issues category, disclose it and check the current rules. A composited real product on an AI background is ordinary product creative — no self-disclosure required, but know Meta may auto-label the synthetic background. Build that check into the gate now; we'll formalise this gate as the QA stage of the production line on Day 15, when we build the full production line. Platform-policy facts checked June 2026 — verify current rules before relying on them.
They type a one-line wish, accept the first photoreal-looking result, and ship it — proud they "used AI." The output is generic AI slop (the training-data average, recognisable as an ad from space) or, worse, a hallucinated product that doesn't match what arrives in the box. Both quietly torch trust and CTR. The reframe, and your edge: AI image generation is not a vending machine, it's a direction skill. The operator who writes a six-input brief and composites the real product ships creative indistinguishable from a studio shoot at a fraction of the cost — and, because every frame is briefed against the genome, ships volume the loop can actually learn from. Anyone can press generate. Almost no one briefs.
You've read the framework — now generate against it. Don't finish Day 13 with zero images.
- Copy the brief template into your creative doc as one row (the img-b07-v03 format — six input columns).
- Fill all six inputs for your product: subject + product fidelity, style ref, composition + format, lighting + treatment, text-in-image, persona + angle.
- Generate 4 frames in any one tool — 2 treatments (hi-fi / lo-fi) × 2 ratios (4:5 / 9:16).
- Run each through the QA gate and reject anything with a warped product, garbled text, or off-palette colour.
Done = 1 saved brief row + 4 tagged frames that passed the gate.
Today's recap — 30 seconds
- The model is a commodity (Midjourney / FLUX / Ideogram / current Gemini image model — pick by job, mapped on Day 11); the brief is the craft.
- Six inputs make an image on-brand: subject+fidelity, style ref, composition+format, lighting+treatment, text-in-image, persona+angle — miss one and the model fills it with the bland average.
- Composite, don't hallucinate: generate the scene, drop in the real product photo. Never let the model invent what the customer will receive.
- AI does both hi-fi and lo-fi (Day 9) — the brief decides which; both stay on-brand because the product is real.
- The brief's fields are the genome tags (Day 4); add a disclosure check at the QA gate — Meta auto-labels generative-AI media on ordinary ads, while deepfake-style or politics-category creative carries a real disclosure duty (and EU AI Act Article 50 from Aug 2026).