← blog

Talk To Your Agent More

Voice is not just faster typing. It preserves the messy context that agents need to understand what we actually mean.

aiagentsvoicewritingworkflows

By Kenny Trinh and Kira

Terminal-whiteboard diagram showing typed prompts as clean but lossy and voice as richer context for agents

I started using Wispr Flow for a simple reason: I felt bottlenecked.

Part of it was typing speed. I wanted a way to get more thoughts into my agents without spending so much energy turning those thoughts into polished prompts first. Dictation seemed like a practical fix: speak more, move faster, and reduce the friction between having a thought and giving it to an agent.

But after using it for a while, I noticed something more interesting than speed.

Typing does not just slow me down. Typing makes my thoughts smaller.

When I type, I compress too early. I clean up the thought before the agent ever sees it. I skip the detail that feels too small to write down. I hide the uncertainty. I remove the moment where I changed my mind. I turn a living thought into a neat instruction.

That can make the prompt look better, but it can also make it less useful.

With voice, I do something different. I say the messy version first. I include the side notes. I explain why I care. I leave in the almost-forgotten detail, the small example, the caveat, the change in direction. Then the agent can help me understand what mattered and polish it afterwards.

The surprising part is that the mess is not a problem. For a capable agent, the mess is signal.

Voice is not just faster typing

The obvious argument for voice is throughput. Most people can speak faster than they type. Dictation removes friction. That matters.

But speed is only the shallow benefit.

The deeper benefit is fidelity.

When I type, I often give the agent the conclusion. When I speak, I give it the path I took to get there.

That path contains useful information:

  • what I considered
  • what I rejected
  • what I was unsure about
  • which examples kept coming back
  • which constraints felt important
  • where my mind changed direction

A typed prompt often says: “Do this.”

A spoken prompt often says: “Here is what I am trying to do, here is why I care, here is what I am worried about, here is the thing I almost forgot, and actually maybe the real point is slightly different.”

That second version is longer. It is less tidy. But it gives the agent much more of the real problem.

This matters more as agents get better.

For weaker systems, clean instructions were everything. You had to remove ambiguity because the system could not handle much of it. For stronger agents, I suspect the tradeoff changes. Richer context may matter more than cleaner phrasing.

The agent can compress later. It can structure later. It can polish later.

But if the detail never enters the prompt, it is gone.

The writing tax

I like organizing my thoughts. I like taking a vague idea, finding the structure inside it, and making it useful.

But writing has a tax.

If I want to write a blog post, I need time to sit down and write. If I want to make a video or podcast, the production burden is even higher. The artifact starts to become the work.

That is not always what I want.

Most of the time, I do not want to “create content” in the internet sense. I just want a low-friction way to externalize a thought, discuss it, organize it, and maybe share it later.

The interviewing flow changes that.

I talk. The agent listens. We discuss the idea together. The agent reflects back the important parts, notices patterns, and helps turn the raw thought into a framework.

The flow is not:

idea → draft → edit → publish

It is more like:

thought → conversation → framework → artifact

Cleaner in-body terminal-whiteboard diagram contrasting content flow with agent-interview flow

The artifact might become a blog post. It might become a Notion note, a decision memo, a prompt for another agent, a slide, a task brief, or just a clearer thought for future me.

That is the part that feels beautiful. The goal is not more content. The goal is less thought lost.

I do not want every idea to pay the writing tax before it can become clear.

I want my thoughts to have an interface.

Agents move the bottleneck upstream

Most tasks are becoming agent-executable.

That does not mean everything becomes easy. It means the bottleneck moves upstream.

The scarce skill becomes knowing what is worth building, what good looks like, what tradeoffs matter, and how to communicate all of that clearly enough that an agent can execute.

A powerful agent pointed in a weak direction just creates more output.

So judgment matters more. Taste matters more. Principles matter more.

And communication becomes infrastructure.

Prompting is not magic words. It is the transfer of context, constraints, standards, and intent from the human to the agent.

If that transfer is lossy, the output gets worse.

This is why I am increasingly skeptical that typing with fingers is the final interface for working with agents.

Typing made sense for computers that needed precise commands.

Voice makes sense for agents that can understand intent.

The prompt should contain the thought

I used to think the bottleneck was typing speed.

Then I thought the bottleneck was writing.

Now I think the real bottleneck is whether my thoughts have a low-friction path out of my head before they disappear.

Voice helps because it preserves more of the original thought. It captures the context before I over-compress it. It lets the agent see not only the final instruction, but the shape of the thinking that produced it.

Maybe the future is not that everyone becomes a better prompt engineer in the narrow sense.

Maybe the future is that more people learn to talk to their agents with enough context, judgment, and honesty that the agent can become a real thinking partner.

You cannot convince me that typing with fingers is the most efficient way to do that.

Talk to your agent more.