cover

Wiggle Catcher

How does a voice assistant understand you?
You say "++Hey, what's the weather?++" and a little speaker on your desk answers back. No buttons, no typing โ€” _just you

You say "Hey, what's the weather?" and a little speaker on your desk answers back. No buttons, no typing โ€” just your voice, floating through the air as invisible wiggles of sound. How does a machine catch those wiggles and turn them into words it understands?

First, the microphone. Inside that speaker sits a tiny drum called a ++diaphragm++. When your sound waves hit it, the di

First, the microphone. Inside that speaker sits a tiny drum called a diaphragm. When your sound waves hit it, the diaphragm shivers โ€” fast wiggles for high notes, slow ones for low. Those shivers become electrical pulses, a wobbly signal that matches the shape of your voice exactly.

~~That signal is a mess~~ โ€” a squiggly line with your words, the hum of the refrigerator, your dog barking, all tangled

That signal is a mess โ€” a squiggly line with your words, the hum of the refrigerator, your dog barking, all tangled together. So the assistant's first job is noise-scrubbing: it finds the repeating patterns (the fridge's steady hum) and subtracts them, leaving mostly just you.

~~Now comes the magic.~~ The clean signal flows into a neural network โ€” a program built like a **giant web of connection

Now comes the magic. The clean signal flows into a neural network โ€” a program built like a giant web of connections, trained on millions of hours of human speech. It doesn't know what a "weather" is yet. It just knows patterns: which squiggles usually mean "weh," which mean "therr."

The network slices your sentence into tiny slivers โ€” ~~20 or 30 per second~~ โ€” and for each sliver, it guesses: **"Proba

The network slices your sentence into tiny slivers โ€” 20 or 30 per second โ€” and for each sliver, it guesses: "Probably a 'w' sound. Probably an 'eh.' Probably a 't.'" It keeps a scoreboard of likely letters, like a chef tasting a stew and guessing the ingredients one pinch at a time.

~~But letters aren't enough~~ โ€” "weather" and "whether" *sound identical*. So a second network, called a ++language mode

But letters aren't enough โ€” "weather" and "whether" sound identical. So a second network, called a language model, reads the letter-guesses and thinks about what makes sense. You said "what's the" before it, so "weather" (the sky thing) is way more likely than "whether" (the if-or-not thing). Context is everything.

Now the assistant has words: ~~"Hey, what's the weather?"~~ It **breaks the sentence into chunks** โ€” "what's" is a quest

Now the assistant has words: "Hey, what's the weather?" It breaks the sentence into chunks โ€” "what's" is a question word, "weather" is the topic โ€” and figures out your intent: you want information, specifically a weather report. That intent becomes an instruction the assistant's brain can execute.

~~In a fraction of a second~~, it fetches the forecast, picks words for the answer, and reverses the whole process: **te

In a fraction of a second, it fetches the forecast, picks words for the answer, and reverses the whole process: text becomes sound-wave instructions, the speaker's diaphragm shivers outward, and a voice you can hear says, "It's sunny today." All from invisible wiggles in the air.

How was this book?

A Wonderleaf Book

Wiggle Catcher

โ€” How does a voice assistant understand you? โ€”

Wonderleaf Editions
โ€” ex libris โ€”
A Wonderleaf Book

Wiggle Catcher

How does a voice assistant understand you?

Wonderleaf Editions ยท MMXXVI
Scene 1
You say "++Hey, what's the weather?++" and a little speaker on your desk answers back. No buttons, no typing โ€” _just you
Wiggle Catcher2
Scene 1

You say "Hey, what's the weather?" and a little speaker on your desk answers back. No buttons, no typing โ€” just your voice, floating through the air as invisible wiggles of sound. How does a machine catch those wiggles and turn them into words it understands?

3Wiggle Catcher
Scene 2
First, the microphone. Inside that speaker sits a tiny drum called a ++diaphragm++. When your sound waves hit it, the di
Wiggle Catcher4
Scene 2

First, the microphone. Inside that speaker sits a tiny drum called a diaphragm. When your sound waves hit it, the diaphragm shivers โ€” fast wiggles for high notes, slow ones for low. Those shivers become electrical pulses, a wobbly signal that matches the shape of your voice exactly.

5Wiggle Catcher
Scene 3
~~That signal is a mess~~ โ€” a squiggly line with your words, the hum of the refrigerator, your dog barking, all tangled
Wiggle Catcher6
Scene 3

That signal is a mess โ€” a squiggly line with your words, the hum of the refrigerator, your dog barking, all tangled together. So the assistant's first job is noise-scrubbing: it finds the repeating patterns (the fridge's steady hum) and subtracts them, leaving mostly just you.

7Wiggle Catcher
Scene 4
~~Now comes the magic.~~ The clean signal flows into a neural network โ€” a program built like a **giant web of connection
Wiggle Catcher8
Scene 4

Now comes the magic. The clean signal flows into a neural network โ€” a program built like a giant web of connections, trained on millions of hours of human speech. It doesn't know what a "weather" is yet. It just knows patterns: which squiggles usually mean "weh," which mean "therr."

9Wiggle Catcher
Scene 5
The network slices your sentence into tiny slivers โ€” ~~20 or 30 per second~~ โ€” and for each sliver, it guesses: **"Proba
Wiggle Catcher10
Scene 5

The network slices your sentence into tiny slivers โ€” 20 or 30 per second โ€” and for each sliver, it guesses: "Probably a 'w' sound. Probably an 'eh.' Probably a 't.'" It keeps a scoreboard of likely letters, like a chef tasting a stew and guessing the ingredients one pinch at a time.

11Wiggle Catcher
Scene 6
~~But letters aren't enough~~ โ€” "weather" and "whether" *sound identical*. So a second network, called a ++language mode
Wiggle Catcher12
Scene 6

But letters aren't enough โ€” "weather" and "whether" sound identical. So a second network, called a language model, reads the letter-guesses and thinks about what makes sense. You said "what's the" before it, so "weather" (the sky thing) is way more likely than "whether" (the if-or-not thing). Context is everything.

13Wiggle Catcher
Scene 7
Now the assistant has words: ~~"Hey, what's the weather?"~~ It **breaks the sentence into chunks** โ€” "what's" is a quest
Wiggle Catcher14
Scene 7

Now the assistant has words: "Hey, what's the weather?" It breaks the sentence into chunks โ€” "what's" is a question word, "weather" is the topic โ€” and figures out your intent: you want information, specifically a weather report. That intent becomes an instruction the assistant's brain can execute.

15Wiggle Catcher
Scene 8
~~In a fraction of a second~~, it fetches the forecast, picks words for the answer, and reverses the whole process: **te
Wiggle Catcher16
Scene 8

In a fraction of a second, it fetches the forecast, picks words for the answer, and reverses the whole process: text becomes sound-wave instructions, the speaker's diaphragm shivers outward, and a voice you can hear says, "It's sunny today." All from invisible wiggles in the air.

17Wiggle Catcher

~ finis ~

Tiny picture books for big little questions.

โ€” a small constellation of questions โ€”
โœฆWonderleaf
Editions