Dictating technical terms, names and jargon

Every dictation tool, no matter how good, has the same weak spot: technical terms, product names, people's names, and field-specific jargon. If you dictate ordinary prose, accuracy is high. The moment you say a library name, a drug name, or a colleague's surname, the error rate jumps. This is not a flaw in one app — it is a structural limit of how speech recognition works.

This post explains why that limit exists and gives you honest, practical tactics for working around it.

Why technical terms are genuinely hard

A speech model turns sounds into the most likely sequence of words, and "most likely" is doing a lot of work in that sentence. The model leans heavily on language patterns: it knows which words tend to follow which. That is why it transcribes everyday sentences so well.

Specialized terms break the pattern in two ways.

First, they are rare. A model has seen the word "the" billions of times and a niche framework name almost never. It has weak expectations for rare words, so it falls back on common words that sound similar — "Kafka" becomes "coffee", "kubectl" becomes "cube control".

Second, proper nouns are unpredictable by design. There is no language rule that says your colleague is named "Siobhan" rather than "Shevaun". Names do not follow spelling logic, so the model genuinely cannot infer them from sound.

Once you see why this happens, the tactics make sense: you are not trying to fix the model. You are giving it help where it has none, and accepting cleanup where help is not possible.

Tactic one: say it naturally, fix it after

The instinct when a term comes up is to slow down and over-pronounce it. Resist that. Exaggerated pronunciation gives the model an unusual sound it has heard even less than the normal one, so it often makes things worse.

Instead, say the term the way you normally say it, in stride, and keep going. Dictate the whole sentence or paragraph. Then go back and fix the one or two terms that came out wrong, by hand.

This works because the cost of fixing a known word is low — you click and type it — while the cost of breaking your flow is high. Dictation is fast because of momentum. Stopping to fight each term destroys that and makes the whole task slower than typing would have been. Draft with momentum, correct in a pass at the end. See editing your text by voice for more on this two-phase rhythm.

Tactic two: spell it out when you must

For a short, critical term that has to be exact the first time — an order number, an unusual surname in a formal letter — spelling can be the right call. Most tools recognize individual letters reasonably well because letters are a small, fixed set of sounds.

A few practical notes:

Spell at a steady pace with a small gap between letters, not in a rushed burst.
Expect to fix capitalization yourself. "A-P-I" may come back lowercase.
For letters that sound alike over a microphone — m and n, b and p, f and s — a clear, separated delivery helps.

Spelling is slower than speaking, so reserve it for the cases that genuinely need first-pass accuracy. For most terms, the fix-after approach is faster overall.

Tactic three: type the worst offenders

The honest truth is that some words are simply not worth dictating. A handful of terms recur constantly in your work and are reliably wrong: the product you build, your company name, the names of your team, the framework you live in.

For those, stop dictating them. Type them. You can:

Type the term, then dictate the sentence around it.
Set up a text-expansion snippet — a short trigger that expands to the full term — and dictate the trigger.
Keep your draft loose and do a find-and-replace pass at the end for the one or two terms you know will be mangled.

This is not giving up. It is correct tool selection. Voice is for the generative bulk of the text; your keyboard handles the dozen tokens it cannot. Spending your effort there, instead of fighting the same word forty times, is simply efficient.

Tactic four: use the context the model has

The model does use surrounding words to disambiguate, so you can sometimes help it by giving more context.

A term embedded in a full sentence fares better than the same term said alone. "We deployed it with Kubernetes" gives more to work with than "Kubernetes" by itself.
Saying an acronym as a word, if people normally do, can land better than spelling it. "JSON" said as a word often beats "J-S-O-N".
Conversely, an acronym that is normally spelled out — "S-Q-L" versus "sequel" — should be dictated the way your field actually says it. The model has heard the common pronunciation more often.

In short: say things the way your field says them out loud. The model has heard the field's real speech, and matching it gives you the best odds.

Set realistic expectations

Be honest with yourself about what a clean transcript looks like in technical work. If you dictate a paragraph dense with library names, command-line flags, and surnames, you will have corrections to make. That is normal and not a sign you are doing it wrong.

The right comparison is not "perfect transcript versus errors". It is "dictate the prose and fix a few terms" versus "type every word". For most technical writing — emails, documentation, notes, commit messages, chat — the first is still faster, because the prose is the bulk of the text and the terms are a small fraction. Where a passage is almost entirely jargon and symbols, typing may genuinely win. Pick per task.

A workable routine

Putting it together for a technical message or document:

Type the two or three terms you know are unrecognizable, or have snippets ready for them.
Dictate the rest at a natural, even pace — do not slow down for jargon.
Say technical terms the way your field says them aloud; do not over-pronounce.
Spell out only the rare term that must be exact on the first pass.
Read the draft once at the end and fix the handful of mangled terms.

After a week you will know your own personal list of "always type these" words, and dictation in your field gets noticeably smoother.

The honest summary

Technical terms, names, and jargon are the genuine hard limit of every dictation tool, because speech models rely on language patterns that rare and arbitrary words simply break. The winning strategy is not to fight it word by word. Say terms naturally and fix them in a pass; spell out the few that must be exact immediately; and outright type the handful of offenders that recur constantly.

Lispr uses the Whisper speech model, which handles ordinary prose very well and technical terms about as well as any tool can — which is to say, imperfectly. Hold the right Option key, dictate the bulk of your text with momentum, and keep your keyboard handy for the dozen tokens that need it. That division of labor is what makes voice fast even in technical work.