AI + Vishing | A Terrifying Mashup

Written by: David Henderson on Jul 10, 2024

What if I told you I could mimic your voice using AI with almost perfect results? A frightening prospect made possible with the help of AI.


The staggering pace in which AI continues to develop in unarguable. This sprawling piece of technology, gargantuan in its adoption, is undoubtably a result of our curious human nature to explore, seek answers and push boundaries without fully understanding the end game.

AI has seeded its way into our everyday lives almost unchecked. To improve efficiency and relieve the pressures brought on by monotonous tasks and ever-increasing workloads, it's easy to see why.

Although these significant advancements AI have been made with good intentions, there is a growing threat from those with ulterior motives who seek to exploit this tech using some of the most creative methods we've ever seen.

Enter Vishing, or more simply put, Voice Phishing. This is a social engineering tactic used by crime gangs to trick individuals into revealing sensitive information over the phone. Traditionally, a Vishing attack would see a living attacker impersonate a trusted entity such as a bank to pry an individual into revealing certain details which in turn could be used to compromise the targets financial accounts.

Vishing is not a new concept by any means, and you've probably been a target of this type of attack at least once in your lifetime. Where things do get interesting is the ability for AI to mimic, almost perfectly, the voices of those you wouldn't expect to be Vished by. For example, those closest or most influential to you.

I'll not expose which apps and GPTs we're using to create this example. I also need to caveat that by doing this yourself, without a person's permission, could see you infringing on an individual's intellectual property rights and possibly breaking the law.

That aside, let's take a closer look...

Firstly, all GPTs require a training set. In this Vishing example, that's easy. In our content rich social media world, anyone influential or authoritative in nature is likely to have some form of online audio or video that can be used as a sample data set. Just think about how much recorded content exists of business leaders, politicians, law enforcement figures and even social and opinion influencers - vast amounts.

As part of my research into this article, I had to try generating an example for myself. In this case, I asked our own CEO Simon Whittaker for his permission to create an example using his voice. Running the CyberTuesday® podcast and various other online events, we have the perfect training set for the GPTs to learn and mimic the characteristics of how he speaks. And to do so legally and with express permission from my 'target'.

Once the tooling was set up, I was able to produce the GPT audio, transcribe it and generate a mp4 video clip in just a few minutes.

Check out the results below:

Impressive right?

And of course, it would have been rude not to have a little fun with the team on our team Slack channel:

Although not perfect, the accuracy of what we can hear is staggering. More concerning is the speed in which these examples can be generated. Simply feed the GPT a script and a few minutes later, you've an audio file ready to download and use at will.

Thoughts of the endless possibilities on how this tech could be used for good are obvious. The enrichment of entertainment, sports, film, gaming, VR, AR, social interaction and so many others. However, given the incredibly invasive nature of being able to accurately deepfake pretty much anyone, the risk factor for this being used fraudulently seems alarmingly high.

The topic of AI regulation is one that has been circling for some time and has even become a issue discussed by global leaders. On one hand, we could be at an exciting beginning of a monumental milestone in our technological roadmap. On the other, we could see ourselves at a point where the pros fall victim to the weight of the cons. AI is undoubtedly here to stay so now exists the challenge of safeguarding its development without stifling creativity and progress.

Rolling back to our example of Simon, exists a concerning possibility. Soon, we could see advanced real-time linguistic GPTs working in harmony with voice generation, resulting in a scenario where a target could have a 2-way conversation with an AI Simon without realising it. A powerful tool for someone seeking to do bad.

And this isn't just theory, this has happened already. Earlier this year, multinational design and engineering company Arup, was a victim of a sophisticated deepfake scam that saw one of its employees transfer a whopping 25 million dollars to fraudsters unknowingly.

The elaborate scam, which saw the employee targeted in a video call with whom recognised as his chief financial officer and other members of staff, instructing and authorising him to make the transfer. Early suspicions by the employee had been put to rest due to the apparent authenticity of the deepfake which utilised advanced facial and voice replication techniques.

The examples used in this article simply highlight one way in which businesses could see themselves being attacked in the future. However, it leads to other, more alarming scenarios that reach beyond the ability to commit financial fraud. That is, our own ability to shield ourselves from the influence of content that is untrue or non-factual.

Failing to control or even verify such content has the potential to make global politics, stock markets, public opinion and more vulnerable to external influence and tampering.

So, AI. Are you here for the good or for bad? Let's wait and see.

Need help?

Email Us
email hidden; JavaScript is required

Or send us a quick message

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.