Voice to text in any Mac app

Voice features keep showing up inside individual apps. A note-taking app adds a microphone button. A messaging app adds voice typing. A browser extension transcribes into one website. Each is useful on its own. But put them together and you get a strange situation: dictation works here, but not there, and works slightly differently in each place.

There is another way to think about voice on a Mac. Instead of dictation being a feature of one app, it can be a feature of the system — available everywhere a text cursor blinks. That is the idea worth unpacking, because once you have used dictation that way, the per-app version feels oddly limited.

What "system-wide" actually means

A Mac has one thing in common across nearly every app: a place where text goes. The message box in a chat app. The body of an email. A search field. A code editor. A form on a website. A spreadsheet cell. A document. Each of those is a text cursor, and the cursor behaves the same way no matter which app drew it.

System-wide dictation hooks into that. It does not integrate with each app one by one. It listens for a key, captures your speech, and inserts the text wherever the cursor happens to be. The app receiving the text does not need to know anything about it. As far as that app is concerned, the words were typed.

This is how Lispr works. You hold the right Option key, speak, and release. The recognized text lands at the cursor — in any app. There is no list of supported apps, because there is no per-app integration to support. If you can type into it, you can dictate into it.

Why "works everywhere" matters more than it sounds

At first this sounds like a minor convenience. It is more than that, for a few reasons.

One habit instead of many

Per-app voice features each have their own button, their own gesture, their own quirks. To use them you have to remember which app has voice, where the button is, and how that particular app behaves. That mental overhead is small per app but real in aggregate, and it quietly discourages you from using voice at all.

System-wide dictation is a single habit. One key, everywhere. You stop thinking about whether a given app supports voice, because the question no longer exists. A habit you do not have to think about is a habit you actually keep.

The long tail of text boxes

Most of your typing is probably spread across a surprising number of small text boxes. A renaming dialog. A commit message. A calendar event title. A bug tracker comment. A search bar. No app vendor is going to build a polished voice feature into a renaming dialog. But system-wide dictation covers it anyway, because it does not care what the text box is for.

The value of voice is highest exactly in this long tail — the dozens of tiny writing moments that no single app will ever bother to optimize.

Consistency

When dictation behaves identically everywhere, you can trust it. You know what the gesture is, you know roughly how long it takes, you know the text will appear at the cursor. That predictability is what turns a feature into a tool. Per-app voice, by contrast, is a patchwork — fast here, slow there, different gesture again somewhere else.

The honest case for built-in app features

System-wide dictation is not strictly better at everything, and it is worth being clear about that.

An app that builds voice into itself can do things a general tool cannot. It can route a transcript into a specific structured field. It can attach the original audio to a note. It can offer voice commands tied to that app's own actions. If your work lives almost entirely inside one app and that app has a voice feature designed for it, that tight integration may serve you better than a general-purpose tool.

There is also Apple's built-in Dictation, which is itself system-wide, free, and on recent macOS runs on-device. It is a genuinely reasonable default and costs nothing. If it works well for you, there is no need to add anything. We compare it with Lispr in detail in Lispr vs Apple Dictation.

The point is not that system-wide always wins. It is that for most people, most of the time, having one dependable dictation gesture that works in every text box is more useful than a scattering of per-app features.

How it feels in practice

A normal stretch of work might touch a chat app, an email client, a browser, a code editor, and a couple of small dialogs. With system-wide dictation, that is one gesture used in five places. You hold the key, say the words, release, and they appear. You do not change tools or change habits when you change apps.

With Lispr specifically: the app is about 4 MB, lives in the menu bar with no window, and the round trip is roughly 200 milliseconds. It auto-detects from around 99 languages, so switching languages mid-day needs no setting change. And it leaves your clipboard untouched, so dictating never disturbs whatever you had copied.

Closing

The useful shift is to stop thinking of dictation as something a particular app offers and start thinking of it as something your Mac does. Wherever a cursor blinks, you can speak instead of type. One gesture, every app, no per-app setup.

If you want to see this applied to a specific case, dictating in Slack on a Mac walks through one app in detail — though the same gesture works just as well in your email, your editor, and every small dialog in between.