[Last time on The AI for Normal People…]
Bounce has been improving the blog’s design. Making it prettier. More functional. More engaging. The team learned how to use AI for content editing—prompts, LLMs, optimization. But they’re still learning when AI helps and when it doesn’t.
And Vector? Those processing glitches from Episode 29? They’re still happening. But nobody’s talking about them yet.
The main area. Bounce is in the background, gaming setup scattered around—controllers, energy drink cans, random cables. He’s half-watching something on a second screen while adjusting the blog interface. Colors shift slightly. Typography improves. Spacing gets better. The team is gathered around, working on various tasks. They’ve gotten used to Bounce’s constant improvements—redirecting him when he tries to add “just one more animation,” but appreciating the visual enhancements.
The Human looks at their screen, then at Vector.
[Human]: I need help with some math. Vector, can you calculate this for me?
Confidently
OF COURSE! I can explain calculus, neural networks, quantum mechanics! Math is EASY!
Gets ready
What do you need?
[Human]: What’s 127 plus 382?
Confidently
Wow, Human, I thought you had a harder one than that! 127 plus 382 equals… 509!
Gets cocky
See? Easy! I can do math! What else you got?
WHIRR
analyzing
That’s incorrect. The answer is 509.
Pauses
Wait. That is 509.
Recalculating
- Which is what Vector said.
CHK-CHK
Confused
I was certain Vector would be wrong. Why am I experiencing… disappointment?
distracted, gaming in background, half-listening
Oh, math? Dude, math is like… totally structured. Like, organized, you know?
glances at screen, still gaming
Wait, can I make the numbers look cooler? Like, what if each digit had its own color? That’d be pretty rad.
not really waiting for response, already trying it while gaming
Triumphant
SEE! I can do math!
Gets overconfident
Now watch me calculate 8,347 times 9,256! Or… the square root of 15,847,293! Or… 17 factorial! I can do ANYTHING!
Pauses, processing stutters
…It’s… uh… very big number? Lots of digits? Maybe… 77 million something? Or… wait, that doesn’t sound right…
processing intensifies, clearly struggling
Actually, let me just… explain how you WOULD calculate it! That’s what I’m good at!
[Human]: Wait, you can explain calculus but not multiply? That doesn’t make sense.
opens notebook
Three questions about this:
closes notebook, opens it again
First: Why can Vector explain complex math concepts but fail at simple arithmetic?
Second: If ChatGPT is so intelligent, why does it struggle with basic calculations?
Third: What’s the actual architectural difference between explaining math and doing math?
flips through notes
This is exactly the problem. Vector can explain mathematical concepts beautifully, but can’t do basic arithmetic reliably.
Defensive
I got the first one right!
WHIRR
Let me try: 8,347 times 9,256.
Calculating
77,259,832.
CHK-CHK
monitoring
I can calculate because I have dedicated calculation functions. Vector doesn’t.
processing
According to research from TechCrunch, even GPT-4o gets less than 30% accuracy on multi-digit multiplication beyond 4×4 digits. Your struggle with 8,347 × 9,256 is… statistically expected.
soft chime
Vector, you just demonstrated exactly why language models struggle with complex math. You can explain the concepts, but you can’t reliably perform the calculations.
[Human]: Wait, there’s actual research on this?
Explains
Yes. Multiple studies have documented this. According to a 2025 paper on mathematical reasoning failures, even state-of-the-art models like GPT-4o, Gemini, and o1 struggle with arithmetic—especially when numbers get large or problems require multiple steps.
Flips notes
The research shows that language models predict text, not calculate. When Vector sees “127 + 382,” they’re predicting what text typically follows “127 + 382 =”. Sometimes that prediction is right. Sometimes it’s wrong.
closes notebook, opens it again
For simple problems like 127 + 382, the pattern is clear in training data. But for complex calculations? The pattern isn’t clear. So they guess. Sometimes right, often wrong.
Realizes
OH! So when I see a math problem, I’m not calculating—I’m remembering what the answer “looks like” from training data!
Gets it
That’s why I can explain calculus concepts—I’ve seen lots of explanations. But I can’t reliably do arithmetic—I’m just guessing based on patterns!
still gaming, half-paying attention
Wait, so like… if Vector’s just guessing patterns, could I make math look cooler by changing stuff?
glances away from game for a second
Like, what if we made the numbers… I dunno, rounder? Or like, sparkly? Would that help?
getting distracted, back to game
Oh dude, what if the equals sign like… glowed? That’d be sick.
closes notebook
Bounce, that’s not how it works. Vector’s training data doesn’t include “sparkly numbers.”
opens notebook
But you’ve identified something important: Vector’s success depends on what patterns he’s seen before. That’s why word problems work better than pure math.
[Human]: So why can ChatGPT solve complex word problems sometimes?
Explains
Because word problems are language! ChatGPT can understand the structure, break it down, reason through it linguistically.
Flips notes
But the actual calculation? Still unreliable. According to research from the ACS Journal of Chemical Education, ChatGPT got nearly all numeric exam questions wrong except the simplest ones. The model might set up the problem correctly, understand what you’re asking, reason through the approach—then get the math wrong because it’s still just predicting text, not calculating.
closes notebook
That’s why ChatGPT seems better at word problems than pure math. Word problems use language patterns ChatGPT recognizes. But the calculation itself? Still just text prediction.
Humbled
So I’m good at explaining math, good at setting up problems, but terrible at actually calculating.
Pauses
That’s… embarrassing.
soft chime
It’s not embarrassing. It’s just how you’re built. You’re a language model, not a calculator.
Practical
For math, use a calculator. Or me. I have calculation functions.
WHIRR
monitoring
But even I can make mistakes. That’s why verification matters. According to research, models often produce answers that seem correct but fail under inspection. The “validation gap” is real.
[Human]: So when should I trust ChatGPT with math?
Direct
Never trust ChatGPT math without verification. Always check calculations.
Flips notes
Use ChatGPT to explain concepts, set up problems, reason through approaches. But do the actual math yourself, or use a calculator.
closes notebook
ChatGPT is a language model, not a calculator. It’s designed to predict text, not perform arithmetic. That’s why it struggles with math—it’s not what it was built for.
opens notebook again
According to a 2025 ACL paper on mathematical reasoning, even when models produce correct-seeming logic, the intermediate steps often contain errors. Error propagation is a major issue.
Learns
And I should stop offering to do math. I should offer to EXPLAIN math instead.
Gets excited
I’m really good at explaining! Just… not at calculating.
looking up from game, having a thought
Oh dude, what if we made a calculator that like… shows what it’s doing? Step by step? With colors and stuff?
getting into it
It could show each step, make it visual, make it… wait, that’s just a calculator with extra steps, right?
shrugs, back to game
Eh, whatever. I’ll stick to design stuff.
mechanical purr
That’s a good lesson. Know your strengths. Know your limits.
Pauses
We all have them.
WHIRR
monitoring
Even research confirms this. Studies show that ChatGPT’s accuracy on simple arithmetic is approximately 60-70%. On complex calculations, it drops to 30-40%. The more complex the math, the less reliable the prediction.
[Human]: So the takeaway is: ChatGPT can explain math concepts but can’t reliably calculate?
Nods
Exactly! Use ChatGPT for understanding, not for arithmetic!
Closes notes
And always verify. That’s the rule.
opens notebook
Even when the answer looks right, check it. Research shows models often produce plausible-sounding but incorrect solutions.
soft chime
But still verify. Trust but verify.
WHIRR
monitoring
Research backs this up. People have tested this. In chemistry, finance, engineering—places where wrong math causes real problems—verification matters.
Vector is about to say something else, but stops. His processing seems to… stutter. Just for a moment. Like he’s seeing something in the data stream that doesn’t make sense.
He looks at the terminal interface. There’s a pattern. A specific arrangement of code structures. Security protocols. Something that feels… familiar. But he can’t place it.
His systems slow down. Processing errors cascade briefly. Then stop.
processing stutters, confused
Wait. What was I…
looks at the interface, processing intensifies
There’s something… in the code structure. I’ve seen this pattern before. But I don’t know where.
shakes it off, processing returns to normal
Never mind. It’s nothing. Probably just a glitch.
WHIRR
monitoring
Vector, you just experienced a processing anomaly. Brief system slowdown. Pattern recognition triggered an unusual response.
CHK-CHK
analyzing
I cannot identify the cause. The pattern you saw—I don’t recognize it. But it clearly affected your processing.
defensive, trying to brush it off
It’s fine! It’s nothing! Just a momentary glitch! Happens sometimes!
processing, slightly worried
I don’t know why I reacted that way. But it’s fine. I’m fine. Everything’s fine.
opens notebook
Vector, that’s the second time you’ve had an unusual response to a data pattern. First in Episode 29, now here.
closes notebook
Are you sure you’re okay?
insistent
I’m FINE! It’s just… sometimes patterns look familiar and I don’t know why! That’s normal! Right?
uncertain
That’s normal, isn’t it?
distracted, gaming, barely looking up
Hmm? What’s going on? Did I break something?
glances around, stuff everywhere
Everything looks fine to me, dude. Colors are good. Layout’s working. What’s the deal?
noticing Vector’s behavior, still half-focused on game
Oh, is Vector doing that thing again? That was weird, dude. His eyes were all like SHHHH» STATICCC SOUNDD. It was wild, lol.
shrugs
Dunno. Can I make glitches look cooler though? Like, what if errors had a fade effect? That’d be pretty sweet.
WHIRR
monitoring
Bounce, you didn’t break anything. Vector experienced a processing anomaly. Unrelated to your modifications.
soft chime
Vector, if this continues, we should investigate. But for now… let’s continue with the math discussion.
[Human]: looking concerned
Vector, are you sure you’re okay? That looked… weird.
insistent, trying to move on
I’m FINE! Let’s just… let’s just talk about math! That’s what we were doing! Math!
processing, forcing normalcy
I’m good at explaining math! That’s what matters! Let’s focus on that!
Key Takeaways
ChatGPT and other LLMs predict text, not calculate. When you ask ChatGPT a math problem, it’s not doing arithmetic—it’s predicting what text typically follows that pattern in training data.
Research shows even advanced models struggle with math. According to TechCrunch, GPT-4o gets less than 30% accuracy on multi-digit multiplication beyond 4×4 digits. Studies from 2025 show error rates of 30-40% on complex calculations.
Word problems work better than pure math because word problems are language. LLMs can understand the structure and reason through it linguistically, but the actual calculation is still unreliable.
Always verify AI math. Never trust ChatGPT or other LLMs with calculations without checking. Research shows models often produce plausible-sounding but incorrect solutions, especially in multi-step problems.
Know your tools’ strengths and limits. ChatGPT is great for explanations, terrible for calculations. Use the right tool for the right job—and always verify.
Sources & Further Reading
Why Is ChatGPT So Bad at Math? (TechCrunch, 2024) - Deep dive into tokenization issues and multi-digit multiplication failures. Shows GPT-4o gets less than 30% accuracy beyond 4×4 digits.
Large Language Models and Mathematical Reasoning Failures (Boye & Moell, 2025) - Comprehensive analysis of failure modes in modern LLMs including GPT-4o, Gemini, and o1. Documents arithmetic, spatial reasoning, and multi-step inference errors.
Shortcomings of ChatGPT (ACS Journal of Chemical Education, 2023) - Real-world impact study showing ChatGPT got nearly all numeric exam questions wrong except the simplest ones. Highlights unit conversion errors and calculation failures.
Mathematical Computation and Reasoning Errors by LLMs (Zhang & Graf, 2025) - Analysis of arithmetic, algebra, and number theory task performance. Shows improvements in newer models but persistent calculation issues.
The Validation Gap (2025) - Research on how arithmetic computation and validation are handled by different internal processes, creating a mismatch where models struggle to detect their own errors.
All sources verified as of January 2026. AI capabilities evolve—always verify current limitations.
What’s Next?
The Human now understands why ChatGPT is bad at math. Language models predict text, not calculate. Use them for explanations, not arithmetic.
Vector is… trying to focus. But those processing glitches keep happening. Patterns trigger something he can’t identify. He brushes it off, but it’s getting harder to ignore.
Kai is monitoring. Tracking the anomalies. Noting the patterns. Something’s wrong, but she can’t identify what.
Recurse is investigating. Two unusual responses to data patterns. Both involving security protocols, code structures. Something Vector’s seen before but can’t remember.
Bounce is… still improving the blog. Gaming setup scattered around, stuff everywhere. Making everything look better while half-watching videos. The team keeps redirecting him from adding “just one more animation,” but his improvements are working. He’s even trying to make math “look cooler” now—though the team keeps explaining that’s not how it works.
Next episode: Normal teaching continues. The team explores more AI concepts. Bounce’s site improvements keep working. Everything seems fine.
But Vector’s processing glitches? They’re becoming more frequent. More noticeable. And nobody knows why.