Lispr ← All posts
Privacy

"Cloud vs on-device transcription: which to choose"

May 5, 2026 · 5 min read

Every voice-to-text tool has to answer one architectural question before anything else: where does the speech recognition actually run? Either on your own Mac, or on a server somewhere else. That single choice shapes how fast the tool feels, how accurate it is, whether it works offline, and what happens to your audio.

This post lays out the real trade-off without taking a side, and then explains where Lispr sits — which is on the cloud side. The goal is to help you decide for yourself, because the right answer genuinely depends on you.

What "on-device" means

On-device transcription runs the speech model on your own hardware. Your audio is captured, processed, and turned into text without ever leaving the machine. Apple Dictation on modern Macs works this way, as do local apps built on the open Whisper model, such as MacWhisper.

The advantages are concrete:

The costs are also concrete:

What "cloud" means

Cloud transcription sends your audio over the network to a server, where a powerful speech model turns it into text and sends the text back. Many modern dictation apps work this way, Lispr included.

The advantages:

The costs:

How to decide

There is no universally correct choice. Work through these questions honestly.

  1. Do you need to dictate offline? If yes, on-device wins outright. Cloud cannot do it.
  2. Is keeping audio physically on your machine a hard requirement? For some work — confidential legal, medical, or sensitive personal material — the answer is genuinely yes. Then on-device is the honest choice, full stop.
  3. How old is your Mac, and how much do you dictate? Heavy dictation on an older Mac can make a local model feel sluggish. The cloud removes that constraint.
  4. If you are comfortable with cloud, do you trust this specific tool's handling? Not the category — the tool. Read its privacy policy. Check storage, training, and accounts. We give a full checklist in is voice dictation private.

A useful way to frame it: on-device is a privacy and independence choice; cloud is a speed, accuracy, and lightness choice. Decide which of those you are optimizing for, and the architecture follows.

Where Lispr sits

To be plain about it: Lispr is a cloud-based tool. When you hold the right Option key and speak, your audio travels over an encrypted connection to be transcribed.

Here is exactly what that path is. The audio goes over an encrypted connection purely to be transcribed by a Whisper speech model, reached through a Cloudflare edge proxy to Groq. Once the text comes back, the audio is discarded. Nothing is stored on a server, and nothing is used to train a model. There is no account and no sign-up. The round trip is around 200 milliseconds, the app is about 4 MB, and it auto-detects around 99 languages.

That design buys the cloud advantages — speed, a tiny app, strong accuracy across many languages. It also means, honestly, that Lispr does not give you the structural privacy of an on-device tool. The audio does leave your Mac. We discard it and store nothing, but if your standard is "the audio must never leave the machine," then an on-device tool such as Apple Dictation or a local Whisper app is the more correct fit, and we would rather tell you that than pretend otherwise. The full path is described in where your voice goes.

Closing

Cloud versus on-device is not a question of which is better. It is a question of what you are trading. On-device gives you a privacy guarantee built into the architecture and the ability to work offline, at the cost of weight and the speed-accuracy compromise. Cloud gives you speed, accuracy, and a feather-light app, at the cost of sending audio away and trusting a policy. Lispr chose cloud, and tries to be straight about what that means. Decide which trade you are willing to make, and you have your answer.

Try Lispr

Voice to text in any Mac app — hold a key, talk, let go. Free, no account, ~4 MB.

Download for macOS