We often say or hear others say that a picture is worth a thousand words. But how many words — or even pictures for that matter — are worth a voiced, spoken word?
Take this: “I don’t believe it! Is this for real?”
On the surface of it, the exclamation appears to be negative — and in fact, if you run it by the Google sentiment analysis tool, which examines the text only, you will get a decidedly negative sentiment score.
And yet, one can easily imagine a scenario where the exclamation was at the other end of the spectrum: the expression of a very positive feeling — i.e., a reaction to some delightful news: such as landing a long sought client, when we thought we had lost them, or the receipt of a prestigious award that we didn’t even dare dream we would receive.
Now, let’s imagine that this exclamation was written in, say, in an email exchange. For us to be able to understand that it was the expression of something positive, we would need to read the emails — i.e., the context within which the exclamation was written.
But if the exclamation was uttered — that is voiced and spoken — and we had access to just the audio spoken, we would probably not need more than that audio: in other words, the audio itself would probably contain enough information, enough cues, for us to tell which of the two parsings is the right one: the earnestly positive one or the disheartened, negative one.
Here’s an example of how a negative one would sound.
And here’s an example of how a positive one would sound.
In other words, and remarkably, I really didn’t need more than a self-contained audio to accurately assess the true sentiment of the speaker. Someone who doesn’t know anything about me, what kind of person I am, what my likes and dislikes are, and is not privy to the context of my utterances, would probably be able to pin down exactly how I feel. Such is the astonishingly expressive power of audio.
In addition to emotion, we would probably be able to tell many other things: whether the speaker is a man or a woman, whether the speaker is someone I know or a stranger. We would probably also be able to tell their age, and their personality (are they easy going, aggressive, kind, timid, domineering). From their accent, we would probably be able to tell their ethnicity, the region they are native to (Brooklyn or New Orleans or Boston), whether they are a native speaker of the language or not, and so forth.
Beyond audio, we could also pick up cues from their conversational style: do they interrupt the other person or do they let them speak? Do they hesitate when answering questions or do they answer promptly? Do they answer questions directly or do they answer them evasively? That’s a lot of information about the speaker and their state of mind.
What is also powerful about voice and audio is that one can assess a speaker’s emotions first hand: the assessment is primary as opposed to secondary; we assess it ourselves rather than receive an assessment that is mediated by the speaker as they reflect on how they feel. Asking someone the explicit question: ‘Do you like this product?’ will undoubtedly receive many frank responses, positive or negative, but how about the majority of the responses — the polite ones — answers from those who wish to be polite and give you an answer that, on the surface, if one were to simply read the text, sounds positive, but upon closer examination, is at best ambivalent? Hesitations, pauses, forced tones, words spoken through clenched teeth — all of that is right there, embedded in the audio, and we humans are remarkably good at picking up the signals.
Voice audio is also powerful because it enables us to get what someone is feeling in the moment of their actions, when they are doing what they are doing. Unlike reading and writing, listening does not require your eyes to be focused on anything to give or receive information. We speak and listen with our eyes closed. This opens up a world of context where we can assess sentiment when we couldn’t before: in dark rooms, while driving, while watching TV, reading, drawing, potting a plant, admiring a painting, and so on.
Similarly, unlike writing and reading, speaking and listening do not require us to use our hands. We can express ourselves with our hands occupied doing something: holding a book, typing on a laptop, preparing food, folding laundry, holding a vase, putting on mittens, taking a bath or a shower, and so on.
Last: in normal circumstances, uttering a few words to a smart speaker or to your AirPods requires far less effort than typing text or navigating (tapping or swiping) on a small screen. No laptops or smartphones that need to be found, turned on, signed into, and powered up. If you can hear and speak, you just hear and speak.
Accurately assessing how our customers and prospects are really feeling, and acting on such information can often mean the difference between retention and churn, or acquisition and loss. The tools are finally here for Marketers and Customer Care managers to leverage the wealth of information that voice, audio, and conversational AI offer. It’s time to listen carefully to those who can make or break your business.