AI VOICE CLONING – Unleashed

In an age where technology seems to blur the lines between reality and fiction, voice cloning emerges as a potent example of our evolving digital landscape. Voice cloning, the process of replicating a person’s voice using advanced algorithms, has garnered both fascination and concern worldwide. There are very diverse applications of AI voice cloning such […]

by Khushbu Jain and Ninad Barge - May 16, 2024, 7:30 am

In an age where technology seems to blur the lines between reality and fiction, voice cloning emerges as a potent example of our evolving digital landscape. Voice cloning, the process of replicating a person’s voice using advanced algorithms, has garnered both fascination and concern worldwide.

There are very diverse applications of AI voice cloning such as preserving legacy by keeping the voices of your loved ones alive for the generations to come, it feels like they never left. Also, considering medical implications, it provides voice for those who are losing it due to illness or disability, Apple’s iOS 17 introduced voice cloning to aid such people. Its creativity has no limit, it provides people with an immersive experience through cloning whichever voice they want to narrate them a bedtime story or in audio games, custom virtual assistants provide an immersive digital interaction. Even, song producers clone the voice of actors or singers in order to use them, YouTube’s Dream Track is a feature which creates song clips featuring AI vocals with the permission from the pop stars.
It is very fascinating to use such kind of technological magic to innovate using your creativity. But, the freely accessible Software’s like these, eventually lead to great modern day Cyber-Heists!

All it Takes is- Few Dangerous Seconds. THAT’S IT!
(a) Imagine your child crying on the phone and asking for help, any parent would panic and do as the daughter say, but in actual it is not the daughter on the phone, instead it’s the cloned voice which is being heard by the parent, similar case happened with Sarita Khanna, the middle-aged wife of a businessman from MP’s Khargone, she got a call from an unknown number saying her daughter is kidnapped and demanded for Rs. 3 lakhs, and the daughter was heard crying on the phone, she was tricked in sending the money. Whereas, her daughter was safe and sound at her hostel.

(b) In another example, a Lucknow-based government official received a call from an unknown number claiming that his son, who is studying in Bengaluru, had been caught with narcotics and was in police custody, demanding Rs.1 lakh to release him; the boy’s voice could be heard saying “papa mujhe bacha lo”. After hearing his son’s voice, he sent the money without thinking twice.

(c) Another example is retired Coal India officer PS Radhakrishnan, who was duped into paying 40,000 rupees in response to a call from his old buddy of 40 years, Venu Kumar, who had contacted him after nearly a year of no communication to beg his assistance with an urgent financial transaction. Radhakrishnan stated the voice on the phone sounded exactly like him, and they talked for a long time, leading him to assume it was Venu Kumar himself; in reality, it was another AI-cloned voice working its intellect.

(d) An unusual matter of fraud used an AI generated cloned audio to defraud a United Kingdom based energy firm of $243,000. According to the report from the Wall Street Journal, in March 2019, the fraudsters utilized a voice-cloning AI software to impersonate the voice of the chief executive of the firm’s Germany-based parent company in order to effectuate an illegal fund transfer. The Cybercriminals called the U.K. firm’s CEO posing as the CEO of the parent company. Then the attackers demanded for an urgent wire transfer to be made to a Hungary-based supplier and the U.K. firm’s CEO was guaranteed a reimbursement. After the money was transferred, it was forwarded to an account in Mexico and then other locations, making it difficult to identify the fraudsters.

How to protect yourself?
If bad actors are using voice cloning to mimic voices and commit crimes, it is important for us to stay vigilant. There are some common signs and red flags that you can look out for:
1. If you are answering a call from an unknown number, let the caller speak first, if you say as much as “Hello? Who is this?” They could use that audio sample to impersonate you.
2. If you receive a call or message from someone you know, but they make unusual or out-of-character requests, it could be a sign of voice cloning. For example, if a friend or family member suddenly asks for sensitive information or money, proceed with caution and verify their identity through other means.
3. Voice cloning technology may not perfectly replicate the original voice, leading to subtle inconsistencies. Pay attention to any noticeable changes in tone, pitch, or pronunciation that are out of character for the person you’re communicating with.
4. Cloned voices may have lower audio quality or exhibit artifacts. If the voice on the other end of the call sounds distorted, robotic, or unnatural, it could be a sign of voice cloning.
5. If the background noise during a call seems inconsistent with the expected environment of the caller, it could indicate a cloned voice. For example, if you hear noises that don’t match the typical sounds of a workplace or home, it may be a cause for suspicion.
6. Voice cloning scammers may try to create a sense of urgency or pressure to manipulate you into providing sensitive information or taking immediate action. Be cautious if the caller insists on quick decisions, especially if it involves financial matters.
7. Voice cloning scammers often use caller ID spoofing techniques to make it appear as if the call is coming from a trusted source or a legitimate organization. If you receive a call from a known number but the voice or the content of the conversation seems suspicious, consider contacting the person or organization directly using a verified contact method to confirm the call’s legitimacy.
8. Voice cloning attempts may involve the use of pre-recorded responses or scripts. If the person on the other end of the call consistently provides robotic or repetitive responses that do not directly address your questions or concerns, it could be an indication of a cloned voice.

If someone becomes a victim of such a scam, they may find recourse under the following sections among others:
Under the Information Technology Act, 2000:
=Section 66C: Identity theft and can be applicable if the fraud involves the deceptive use of someone’s voice identity.
=Section 66D: Cheating by personation using a computer resource and can be relevant if the fraudster impersonates another person using technology.

Additionally, relevant sections of the Indian Penal Code, 1860 may apply:
=Section 419: If the fraudster impersonates someone else to deceive the victim.
=Section 420: If the scam involves deceiving someone to transfer money or property.

Way Forward
Law enforcement agencies, technology companies, telcos and research institutions need to collaborate to develop advanced voice authentication and anti-spoofing techniques. These techniques should aim to identify synthesized or cloned voices and differentiate them from genuine human voices.
As for integrating these technologies into phones or communication platforms, some progress has been made in implementing call authentication frameworks. For example, the STIR/SHAKEN framework has been introduced in some countries to verify the authenticity of caller IDs and detect spoofed calls. While these frameworks primarily focus on verifying caller ID information, they can indirectly help in identifying potential voice cloning attempts. India too is working towards adopting a multi-pronged approach by introducing the Calling Name Presentation (CNAP) service which will notify call receivers about the identity proof linked with a SIM card being used by the caller.
Another such approach is the vision of establishing the National Cyber Security Agency (NCSA) which will serve as a centralized organization responsible for addressing digital frauds. NCSA, shall also invest into developing such voice analysis algorithms and machine learning models that can analyze voice patterns, acoustic characteristics and linguistic markers to detect anomalies that may indicate AI voice cloning. These technologies can be integrated into communication platforms or phone apps to provide real-time detection and alerts for suspicious calls.

Khushbu Jain is a practicing advocate in the Supreme Court and founding partner of the law firm, Ark Legal. She can be contacted on X: @advocatekhushbu and Ninad Barge intern at Ark Legal.