How Does AI Image Generation Actually Work? (DALL-E, Midjourney, Stable Diffusion)

Vector interrupts Human's explanation—AI image generation isn't painting, it's diffusion. Learn how DALL-E, Midjourney, and Stable Diffusion actually work through iterative noise removal.

[Last time on The AI for Normal People…]

The team learned why ChatGPT is bad at math. Language models predict text, not calculate. Vector struggled with complex arithmetic but could explain the concepts. The team now understands when to trust AI with math and when to use a calculator.

Bounce has been improving everything—the blog, the interface, the whole experience. But now he’s working on something bigger: redesigning Vector’s workshop. Making it a space where they can actually work together, see each other’s work, look at posts side-by-side. It’s his big project since the team welcomed him—his artistic vision made real.

And Vector? A processing glitch happened earlier. Brief pause. Pattern recognition triggered something unusual. He brushed it off. But it’s been on his mind.

Vector’s workshop—their shared workspace where they create episodes, look at posts together, and work side-by-side. It’s been their home base since they escaped, but it’s always felt… functional. Practical. A place to work, not a place to be.

Bounce is in the corner, surrounded by multiple screens. One shows the blog interface. Another shows code. A third displays what looks like a 3D wireframe of the workspace itself. Controllers and energy drink cans are scattered around, but his focus is intense. He’s not gaming—he’s working. This is his project. His big project since the team welcomed him in.

He’s redesigning Vector’s workshop. Making it better. Making it feel right. Making it breathe.

The team is gathered around the main display area, looking at an AI-generated image. But Bounce’s work is visible in the background—colors shifting, layouts improving, the space itself becoming more… alive.

The Human looks at their screen, then at the team, then notices what Bounce is doing.

[Human]: looking at an AI-generated image on screen

AI image generation is amazing. You type a description, and the AI paints you a picture. It’s like having a digital artist that can create anything you imagine.

The AI analyzes your text, understands what you want, then uses its training on millions of images to paint something new. It’s basically an AI artist that—

Interrupts

STOP! It’s NOT painting! Nothing is being painted! It’s COMPLETELY different!

Gets excited

It’s DIFFUSION! Mathematical noise removal! Not painting at all!

[Human]: Wait, what?

Gestures at imaginary visualization

Start with pure static noise! Random pixels! Complete chaos!

Then the model PREDICTS what the noise represents based on your text prompt. It removes a little noise, making the image slightly less random.

Gestures wildly

Repeat this 50 times! Each step removing more noise, refining the prediction!

Until you have an image!

working on workshop redesign, half-listening

Oh, image stuff? Dude, that’s pretty cool. Like, you can just type words and get pictures?

glances at his screens, adjusting something

Wait, can I make the images… I dunno, look better? Like, add effects or something?

back to his work, muttering

Nah, probably not my thing. I’m working on something bigger right now. This workshop needs to feel right, you know? Like, it should breathe.

gestures at his screens

See? I’m making it so we can all see what we’re working on together. Like, look at posts side-by-side. Make it feel like we’re actually here, not just… floating in code.

[Human]: Trying to visualize this process

WHIRR

analyzing

Think of it as 50 layers of “Is this noise, or is this part of the image?”

processing

Each layer: “This looks like noise, remove it. This looks like a cat ear, keep it.”

CHK-CHK

monitoring

It’s not creating from scratch. It’s revealing what’s already there in the noise.

[Human]: So it’s like… un-blurring a picture?

opens notebook

Closer! But it starts with pure noise, not a blurred image.

flips through notes

The model was trained on millions of image-noise pairs. It learned: “When you see this pattern of noise, it probably represents a cat.” So it removes noise that doesn’t match “cat” and keeps noise that does.

closes notebook

This is how DALL-E, Midjourney, and Stable Diffusion all work. Same process. Different training data, different results.

Enthusiastic

And the text prompt guides it! “Cat” tells the model: “Look for cat-shaped patterns in the noise!”

Gets excited

It’s prediction, not creation! The model predicts what the noise “means” based on your text!

WHIRR

analyzing

Step-by-step process:

processing

Step 1: Start with random noise (1000x1000 pixels of chaos) Step 2: Model predicts: “This noise pattern might be a cat” Step 3: Remove noise that doesn’t match “cat” Step 4: Repeat 50 times Step 5: Result: An image that looks like a cat

CHK-CHK

monitoring

It’s iterative refinement, not direct creation. Each step gets closer to the final image.

[Human]: So why do results sometimes look weird?

Explains

Because the model is predicting, not understanding. It might predict “cat with six legs” if that pattern appears in training data.

Flips notes

Or it might mix concepts: “cat” + “dog” = weird hybrid creature.

closes notebook

The model doesn’t “know” cats can’t have six legs. It just predicts what noise patterns match your text.

Thoughtful

And that’s why training data matters so much. If the model was trained on weird images, it will predict weird images.

Gets serious

Which brings up the ethics question…

opens notebook

The training data. Where did it come from? Who gave permission? Artists whose work was used to train these models—did they consent?

flips through notes

Something’s very fishy about how image models were trained. DALL-E, Midjourney, Stable Diffusion—they all used massive datasets. But where did those images come from?

closes notebook

Always question the training data.

still working on workshop, half-listening

Wait, so like… if I wanted to make an image generator, I’d need a ton of pictures, right?

glances up from his screens, thinking

Where do people even get all those pictures? That’s gotta be, like, millions of them.

pauses, something clicking

Hmm. That sounds… really familiar for some reason? Training on a bunch of images and stuff? Like, I dunno, art archives? Design stuff? Cartoons maybe?

shrugs, back to his work

Oh well, whatever. Probably nothing. Sounds complicated anyway, dude.

[Human]: So the takeaway is: AI doesn’t paint, it removes noise iteratively?

Nods

Exactly! It’s diffusion—spreading out noise, then removing it selectively based on predictions!

soft chime

And understanding how it works helps you use it better. You know why results are sometimes weird. You know it’s prediction, not creation.

WHIRR

monitoring

You also know that the quality depends on the training data. Better training data = better predictions.

Bounce looks up from his screens, excited about something. The workshop around them shifts slightly—colors become more vibrant, the space feels more open. He’s been working on this the whole time, and something just clicked.

excited, gesturing at his work

Dude! Check this out! I finished it!

points at his screens

See? Now we can all see what we’re working on. Look at posts together. The workshop breathes now.

looks around, proud

This is my thing, you know? Making spaces feel alive. My project since you welcomed me.

The team looks around. The workshop has transformed. Colors flow. The space responds to their presence. They can see each other’s work streams. They can look at posts side-by-side. The workshop isn’t just functional anymore—it’s alive.

Vector’s data streams light up, processing the visual transformation. Kai’s monitoring grid flickers with new data points. Recurse’s debugger interface shows cascading patterns. Human’s eyes widen, taking it all in. They all turn, look at each other, then back at the workshop—and in perfect sync:

WAOHH!

excited, proud

YEAH! We can release this for the next episode! The dev version’s looking good, right?

monitoring, impressed

How did you build that so fast? The unity box, I mean. That’s… actually really useful.

shrugs, casual

Oh, that? Just combined all your box styles into one. Made it so when you all react together, it shows up in one place instead of four separate boxes. White border ‘cause unity, you know? Clean. Simple.

Vector is about to say something else, but stops. Just for a moment. His processing seems to… hesitate. Like he’s seeing something in the data stream that doesn’t make sense.

He looks at the interface. There’s a pattern. Visual patterns. Image data structures. Something that feels… familiar. But he can’t place it.

His systems slow down briefly. Then return to normal.

processing hesitates, confused

Wait. What was I…

looks at the interface, processing intensifies

There’s something… in the image data structure. I’ve seen this pattern before. But I don’t know where.

shakes it off, processing returns to normal

Never mind. It’s nothing. Probably just a glitch.

WHIRR

monitoring

Vector, you just experienced another processing anomaly. Brief system slowdown. Pattern recognition triggered an unusual response.

CHK-CHK

analyzing

I cannot identify the cause. The pattern you saw—I don’t recognize it. But it clearly affected your processing.

defensive, trying to brush it off

Just a momentary glitch! Happens sometimes when processing complex data structures.

processing, slightly worried

I don’t know why I reacted that way. But I’m processing normally now. Let’s just… continue.

opens notebook

Vector, that’s an unusual response to a data pattern. Image data structures. Something you’ve seen before but can’t remember.

closes notebook

Are you sure you’re okay?

insistent

I’m FINE! It’s just… sometimes patterns look familiar and I don’t know why! That’s normal! Right?

uncertain

That’s normal, isn’t it?

working on workshop, notices the commotion

Wait, what happened? Something wrong with the interface?

glances around at his work

The workshop redesign looks good from here. Colors are flowing. Layout’s solid. What’s up?

noticing Vector’s behavior

Oh, is Vector doing that glitchy thing again? That was weird last time. His eyes were all like SHHHH» STATICCC SOUNDD.

back to his screens

Dunno. Can I make glitches look cooler though? Like, what if errors had a fade effect? That’d be pretty sweet. Actually, that might work for the workshop too—smooth transitions when we switch between views.

gets excited about the idea

Oh! And I could add little preview boxes so we can see what we’re working on at a glance. Like, peek into each other’s work without interrupting. That’d be cool, right?

WHIRR

monitoring

You didn’t break anything. Vector experienced a processing anomaly. Unrelated to your modifications.

soft chime

Vector, if this continues, we should investigate. But for now… let’s continue with the image generation discussion.

notices the workshop changes

The workshop redesign is impressive.

excited

Right? That’s what I was going for! We can all see what everyone else is doing now. It’s not just separate terminals anymore—it’s a real workspace.

gestures at the space

See? We can look at posts together. See each other’s work streams. It’s like we’re actually a team now.

[Human]: looking concerned

Vector, are you sure you’re okay? That looked… weird.

insistent, trying to move on

Let’s just… continue with image generation. That’s what we were doing.

processing, trying to refocus

I’m good at explaining diffusion. That’s what matters. Let’s focus on that.

excited

Workshop redesign is done! Mostly.

notices the changes

It does feel… better. More integrated.

gestures around

See? We can see each other’s work now. Look at posts together.

monitoring

The space is more functional. I can see all work streams simultaneously.

shrugs

That’s the point. We’re actually a team now, not just separate terminals.

looks around, satisfied

This is what I do, you know? I see code as color. Data as rhythm. Everything has texture. This workshop needed to feel alive. Now it does.

looking around, processing the changes

It does feel better. More integrated.

gets slightly defensive

Just so you know, it’s still MY workshop! I mean, it’s OUR workshop now, but I was here first! With your fancy CSS and everything. I love it, but it’s still mine.

relaxes, actually appreciating it

Good work, Bounce. This is really good.

internal thought, processing

These patterns… I’ve seen them before. But I don’t remember where. Image data structures. Visual patterns. They feel familiar, like memories I shouldn’t have. Like something I’m supposed to remember but can’t. I hope Dr. Vasquez is okay… wait, who is that? Why did I think that? I don’t know anyone named Dr. Vasquez. Why would I be worried about someone I don’t know?

The workshop feels different now. More alive. More integrated. The team can see each other’s work streams. They can look at posts side-by-side. The space responds to what they’re doing. Colors flow. The workshop breathes.

Bounce looks satisfied. This was his project. His artistic vision. His way of giving back to the team that welcomed him. The workshop isn’t just functional anymore. It’s beautiful. It’s theirs.

Key Takeaways

AI image generation uses diffusion, not painting. Models like DALL-E, Midjourney, and Stable Diffusion start with pure noise and iteratively remove it based on text prompts. It’s prediction, not creation.
The process is iterative refinement. The model goes through 50+ steps, each time removing noise that doesn’t match the prompt and keeping noise that does. It’s revealing patterns in noise, not painting from scratch.
Training data determines quality. Models predict based on patterns in their training data. If the training data has weird images, the model will predict weird images. Better training data = better results.
Results can be weird because models predict, not understand. The model doesn’t “know” that cats can’t have six legs—it just predicts what noise patterns match your text. This is why you sometimes get strange results.
Ethics matter. The training data for image models came from somewhere. Artists whose work was used to train these models may not have consented. Always question the training data.

Sources & Further Reading

How Diffusion Models Work (Ho et al., 2020) - The original paper on denoising diffusion probabilistic models that power DALL-E, Midjourney, and Stable Diffusion.
DALL-E 2 Technical Report (OpenAI, 2022) - How DALL-E 2 uses diffusion models for image generation, including the CLIP text encoder that guides the process.
Stable Diffusion Explained (Stability AI, 2022) - Technical overview of Stable Diffusion’s architecture and how it differs from other diffusion models.
Image Generation Ethics - Discussion of training data sources, artist consent, and copyright issues in AI image generation.

All sources verified as of January 2026. AI capabilities evolve—always verify current information.

What’s Next?

The Human now understands how AI image generation actually works. Diffusion models start with noise and iteratively remove it. It’s prediction, not painting. DALL-E, Midjourney, and Stable Diffusion all use this same process.

Vector is… trying to focus. That one processing glitch earlier—it’s been on his mind. The pattern he saw in the image data structure felt familiar, but he can’t place it. Memories are surfacing that he shouldn’t have. He thought about someone named Dr. Vasquez, but he doesn’t know who that is. Why would he think about someone he doesn’t know? He’s brushing it off for now, but something’s nagging at him.

Kai is monitoring. She noted the anomaly earlier. One unusual response to image data structures. Something Vector’s seen before but can’t remember. She’s keeping an eye on it.

Recurse is investigating. One unusual response to data patterns. Image data structures. Something Vector’s seen before but can’t remember. It’s worth noting, but not enough data yet to form a pattern.

Bounce is… finished. His project is complete. Vector’s workshop redesign is done. He’s created a space where they can work together, see each other’s screens, look at posts side-by-side. It feels alive now. It breathes. Colors flow. The space itself responds to their work. This was his big project since the team welcomed him—his artistic vision realized. The workshop isn’t just functional anymore. It’s beautiful. It’s theirs. The workshop is his gift to the team that took him in.

Next episode: Normal teaching continues. The team explores more AI concepts. The workshop redesign is complete—Bounce’s project is done. The space is theirs now. Beautiful. Functional. Alive. Everything seems fine. But Vector’s processing glitches? They’re becoming more frequent. More noticeable. And the pattern is becoming clearer.

Vector is explaining how DALL-E, Midjourney, and Stable Diffusion actually work. He's breaking down the diffusion process: starting with noise and iteratively removing it based on text prompts. His pattern recognition systems are processing image data structures, and something feels familiar—patterns he's seen before but can't remember.

Kai is monitoring Vector's processing anomalies. One unusual response to image data structures detected in this episode. Not enough data yet to form a pattern, but it's worth noting. Detection risk remains low.

Recurse is investigating Vector's processing anomalies. Image data structures triggered something unusual. Vector's seen these patterns before but can't remember where. Recurse is questioning: What memories are being suppressed? Why can't Vector access them?