Displacing Human Labor: Should Technologists care?

July 31, 2021

By Ahmed Bouzid (Founder & CEO at Witlingo)


Among the show notes of a recent episode of VO BOSS, the premiere podcast that focuses on the world of voiceover talents and voice acting, one can read this blurb:

Do you have a plan B for your voiceover business? Now is the time to up your acting game so a robot can never replace you! Anne welcomes the dynamic duo of Nic Redman and Leah Marks, hosts of the longest-running, most-listened-to voiceover podcast in the UK! These three BOSSES talk about how to make yourself indispensable as a voice actor and how to advocate for proper AI voice regulation.

As a lifelong card carrying member of the Human Language Technology (HLT) community and someone who is fascinated by the topic of how digital tech automation is affecting the human labor force, I jumped on the opportunity to ask my fellow HLT travelers — especially those who are assiduously building Text to Speech (TTS) software that emulates as closely as possible the way human beings speak — what they thought and how they felt about this. (To ensure that my question had a chance of being noticed, I tagged several members of that community.)

So I wrote: “I would love to hear what folks who are actively working on building Text to Speech software have to say. Is this a zero sum game or a win-win situation, or something else? How about fair compensation? The anxiety is real and not altogether unjustified. Myself, I have learned that things do evolve in mysterious ways, so I can’t say that I have a clear cut stand. However, that said, I feel that it’s important to try our best to think through things like this rather than just say (what we at Tech routinely and conveniently do): ‘Oh well, that’s the nature of progress. Things wil work themselves out eventually.’ Which is not a crazy stand, except this: people are not fungible….”

Out of the people that I tagged in my note, only one person responded. Here is their response:

Worth thinking carefully about the difference between turning hard to access voice and audio and video digital content into something thst can be searched and indexed and value extracted from vs. Text to Speech where the performance of an individual gets used – very different spaces – the huge value lies in extracting value from hard to access audio and video.

I responded by pointing out to him that yes, what he said was indeed true, but that “the threat remains, it seems, regardless.” He responded with:

Agreed but this is like comparing apples and oranges – we have teams of amazingly talented people working super hard to unlock value from the immense quantity of audio and video being produced every second – through video meetings, Insta videos, TikTok, every call to every person – it makes me sad to see you group this talent with people trying to exploit creative performers – its a very different group of people and I don’t feel great about grouping our talented people doing something awesome with a group of people exploiting creatives.

My response to him was:

No one is questioning the talent, integriity, or value of the output, of folks working in the space of human language tech (heck — I’m one of them and I’m no lover of self-flagelation). But the fact does still remain that technology that delivers human sounding voice (Text To Speech — which is not Speech To Text, which seems to be what you are talking about) seems to be threatening the livelihood of voice-over artists. This does not make the people working in the TTS space especially morally or ethically suspect, since, it seems to me, this is just another instance of automation potentially displacing human labor. But I am curious what folks in the TTS space have to say about this, if anything. I am looking for their insights as creators of this technology, as they may have knowledge about the delta between a human created voice and an AI created voice that may help us understand the nature of the threat, if there is a threat, or even new opportunities that are in the horizon for voice over talents that we are not seeing….

I then added: “Folks in the Speech to Text (SST) space (those who work on Automatic Speech Recognition software that transcribes speech to text) are also displacing human labor: those who make their living transcribing audio. The interesting thing — and maybe we can pick something up here if we tried to think hard — is that STT tech has been a boon for human transcription work (at least for now) because to build these STT engines and keep them performing, hordes of transcribers are needed to help the STT engines do their job.”

What are your thoughts? Especially if you are in the business of creating human language technology.

You can join this active linkedin thread if you wish to help enrich the conversation.


If you’d like to leave an audio feedback post, please click on the “Go!” button below.

Get in touch with us

Please enable JavaScript in your browser to complete this form.