KVNCNNLLY

Nuance, Specificity and the Rise of Voice

I’ve been trained to keep things short, and so have you, which is part of what I want to talk about today1.

This is becoming rapidly obvious, but it’s worth jotting down: we’re at the beginning of a major transition to voice as a table stakes UI. If you look for it, you’ll see it in many places, but various applications and services are integrating voice into their offerings at a rapid clip. Why is that? Again, obvious reasons, you probably know, but I think there are a few nuanced points worth reflecting on here.

  1. There is a real demand for voice, but what’s happening now is not really about high quality speech to text technology
  2. Demand is not being driven by voice-first features (it’s people learning to be specific)
  3. A lot of people, mostly adults, have to unlearn how to interact with machines when it comes to search, queries and what we now generally refer to as prompts

On the first point, I like to imagine a hypothetical Siri in the year 2020 that is perfect as far as speech to text transcription goes. This technological capability might incentivize further application development, but I don’t see a revolution. In short, supply side improvement is not sufficient for generating demand.

A fair counterpoint would be “what about OpenAI releasing Whisper, that created quite a stir and everything has changed!” I agree with that, but it was also released alongside the LLM revolution, so it’s a bit hard to decouple from the underlying technology wave.

That would bring me to my second point, which is that voice-centric features are intriguing, but I don’t think they create deep and broad demand for voice. To use a statistics analogy here, what we want to pinpoint is the lurking variable in observed demand. That is what informs identifying a causal factor in driving the trend.

An example of this is Loom, a tool that allows easy screen recording and sharing on the fly. To say Loom was a success because of the product they offered would be a bit backwards. The demand for that product is comes from users being trapped in the relative molasses of writing everything out by hand. Whether or not Loom exists, for a user in the situation of having (or wanting) to explain computer screen stuff, the latent desire to do that with minimal effort will exist.

So, it’s not so much the raw technology or products existing that drive the demand. It’s more that the user demand precipitates from their particular context or circumstance, which will tap into the more universal desire to do things well, cheaply, quickly or enjoyably, as that’s how humans tend to operate.

Our context of late is pretty clear: we’re typing a whole lotta stuff at computers. But we’ve been collectively doing that for decades now, so what’s different?

What is well-known to people at the forefront of AI usage is that we are now in a world where specificity and detail are really valuable. When you interact with LLM-based agents or services, they are remarkably good at handling nuanced and unique details, and the more verbose you are the better (in most cases). In other words: the conversational nature of interfacing with an LLM is driving the demand for voice.

To lay that out more clearly: we’ve created a technology that thrives on textual input, but in particular it thrives on nuanced and detailed input. That technology has now become omnipresent and quite powerful, and thus a context has been created where many people are typing for the purpose of completing tasks in ways they did not type before. This excessive typing is now creating a demand for voice given the basic time savings economics of voice being much faster.

Now that we’re in the LLM era, the smart user doesn’t want to write a short prompt, they want to add detail. The user is trying to find a way to efficiently provide more detail because that’s what helps them.

This might sound pretty obvious, but if you look around the software space the reality of “short searches” are everywhere. The typical UI element for a search is not a giant long-form text entry field. It’s typically a one-line text input of modest width so you can do a search like “keyword1 keyword2” or put some short descriptive property details in.

On the topic of search, I come to my final point about unlearning, by way of personal anecdote.

This was about a year ago, I had just started exploring vibe coding in earnest. I hit some technical bugs, naturally, and had tried variations of Google search to resolve my woes. No solution could be found.

If you think about how you and the rest of society uses Google search, it’s typically a few keywords (“San Francisco Showtimes” and that kind of thing). What you don’t do is put in a very long and specific search string, and be careful using double quotes for exact matches because you most likely won’t get any results. There are various tips and tricks, and certain “styles” of using google, especially when it comes to programming or technical topics.

At some point I decided to simply ask Claude2 in very specific detail why what I was encountering was likely happening and I got back an detailed answer that helped resolve my issue. All I remember now is the distinct feeling of having violated the way search “should” work and then internalizing that I was dealing with a truly different technology. I wish I saved the question I had asked, but I didn’t, so all I have left is the emotional imprint.

I want to clarify that at this time I certainly knew that LLMs were generally very capable, but there was some slight disconnect in this particular context. Perhaps creating a full-stack app from nothing to something in a single night short circuited my brain temporarily, but I was far from unfamiliar with LLMs at this time. I think that’s important to reflect on.

So what’s the point of that little interlude? I can’t help but feel my approach and slowness on the uptake was due to having effectively been conditioned by using Google for 25 years. Sure, I guess writing a detailed prompt into a magic machine and getting back an amazing answer is not going to be intuitive to anyone, but for the Google generation I feel like it’s going to be even more unintuitive.

If you have a chance to let someone use ChatGPT for the first time, just tell them “search for something” or “ask a question” and watch. You’re probably going to get a Google search-esque query: short, few keywords, not overly detailed.

I don’t think having to unlearn instincts honed by Google search practices will provide long-lasting friction. That may be marginal relative to the scale of the AI revolution as even Google itself is rolling out “AI Mode” in search, not to mention the explosion in services that are AI-native. Behaviors may be reset pretty rapidly as far as typing lengthier search queries and prompts go, and that, inevitably, will lead to people thinking “wouldn’t it be nice if I didn’t have to type all of this out”.


  1. And yes, now that I’ve said that, here’s my long meandering piece. If you want to bail here, I understand. But keep in mind Siri has been integrated into iOS since 2011, and this is where we are today↩︎

  2. My first “vibe coding” project was an ad hoc sprint on a Friday night, after having heard so much hype-y commentary about spec-to-app building that I decided I’d see how far I could get. I wrote a spec with modest details, made some visual mockups in Excalidraw, and used Claude on the web and VSCode. I got a local app working! ↩︎

#Tech #Product #AI