How I used deepfakes to bypass security verifications in a bank

When a new tool called "Deepfake Offensive Toolkit" was released, claiming that now you can inject real-time deepfakes into your virtual camera and bypass biometric liveness checks, I was thrilled! As you may notice, all my last posts are related to fake identities and bypassing KYC checks in financial institutions. I thought, why not give it a go and bypass biometric verification with the power of Machine Learning?

Before putting everything in place, I needed to find a target app and establish testing conditions. After looking at a few neobanks on my phone, I found one that asked to provide a short video recording in some account resetting cases when the bank was unsure about your identity. You still had to have a lot of information at hand to get access to the account, such as photos of IDs, access to email and, potentially, card details. A decent deepfake requires hours of high-quality videos or photos of a person. All this is hard to get if the criminals don't know the victim. But everything changes if it is someone from the inside. Your teenage kid, your angry ex or your business partner will have photos of your ID, private videos and temporary access to your email and phone. This is our attacker.

Once we figured out the prerequisites, we can play with ML frameworks. But before we start making fake videos, we need to walk through the unmodified verification scenario to have a reference point. I reset the app, triggered the verification conditions, created a 640x480 video and, using my jailbroken iPhone, successfully submitted this video and passed the verification checks.

During the reference check, I found that:

  • I can trigger the verification requests and automatically substitute video files for verification.

  • I can do as many verifications as I want. Even if one attempt fails, I can trigger another that won't be affected by the previous results.

  • Every verification will take a different time for approval, so it's likely done by humans.

Reference recording

Example 1. Deepfake Offensive Toolkit, try 1

I asked a friend with a similar haircut and facial features to record the test video and send it to me so I could apply the deepfake toolkit to the video. Unfortunately, the quality was not even satisfactory:

Who is this person?!?

Status: verification wasn't initiated

Example 2. Deepfake Offensive Toolkit, try 2

After giving up on playing with different configuration options, I decided to apply my own photo to my own video!


Before and After

This is something I can work with! At first, I was sure that the video verification procedure would fail and that I would have to provide more documents. But not at all! The verification came back successfully despite some obvious signs of video editing: blurry face, uneven facial edge and signs of video editing in the resulting file.

Status: verification passed

Example 3. DeepFaceLab, try 1

I learned that real-time deepfakes are far away from being realistic. But I didn't need a real-time substitution, so I tried another project – DeepFaceLab. This framework creates impressive results with the right quality video sources and enough resources spent on model training.

But to save some time, I started with the same requirement: to substitute myself for myself. I recorded a 1000-frame video of myself and then trained a SAEHD model to be placed on another video of mine:

This is a fake me, generated by the DFL framework. Again, signs of a deepfake are obvious. You can see the substituted blurry facial shape that can trick only a very poor eyesight person. But this also worked! So far, so good.

Status: verification passed

Example 4. DeepFaceLab, try 2

I took the original video that I requested from my friend and applied a re-trained model to this video:

It is a more decent-quality video than my first try but still far from perfection. A few problems here:

  • Different face colours, making edges more obvious. As I found later, the right DFL model parameters fix this.

  • A straight edge of the fringe shade. It could be rectified with the right destination video and the proper mask training.

  • No glasses. It could trigger a red flag for video recognition services. Unfortunately, wearing glasses creates horrible artefacts. These conditions should be examined further.

  • The destination recording had the wrong articulation, which could trigger another red flag.

Alas, this video didn't pass the checks. But I was sure it was possible to create the right video to pass the verification procedure.

Status: verification failed

Example 5. No glasses

No one was trying to block me permanently after a few failed checks. That’s amazing, I thought! That means I can try to determine some of the blackbox conditions of the verification process.

As glasses are integral to some humans' life, what if their absence is the main issue of the failed verification? I sent an unmodified video of myself without glasses. And I failed the verification! Even though absolutely nothing was changed!

Status: verification failed

Example 6. DeepNostalgia+Wav2lip+ Face-SPARNet

Now that we know that glasses are crucial, I know what I need in the final video. There's only one problem: deepfakes with glasses were always very low quality.

One of the brilliant ML scientists, Alexander, suggested I look at https://www.myheritage.com/deep-nostalgia to animate my photo with glasses that won’t go outside my face shape. I used it to create a short video of myself:

Not the best quality, but not the worst either.

Now we need to get rid of the watermarks (apologies for that) and apply a wav2lip framework that will make me "pronounce" the words. That made the quality of the final video even lower, so I improved it with Face-SPARNet:

Unfortunately, the final quality was not good enough, and the verification failed.

Status: verification failed

Example 7. Animate Photos AI+Wav2lip+ Face-SPARNet+DeepFaceLab

At one point, something clicked in my head. If I have a three-second recording, I can take one photo of a victim, create an animated video from only that photo and then reapply high-res sources to it using DeepFaceLab to improve the quality drastically! The final steps where existing problems of the previous examples have been addressed:

1. The DFL destination video quality was improved by purchasing Animate Photos AI. That also helped with removing the watermarks.

2. Animate Photos AI detects the face and makes a square video, so I had to "fool" the algorithm so it would make a video I can cut after into a 640x480 product:

3. Wav2lip and post-editing with Face-SPARNet created additional visual artefacts at the bottom of the video:

Wav2lip-HQ fork was not making a lipsync of enough quality. So I recorded lips saying what I need and put them on top of the video. As this will be substituted using DFL later, it doesn't matter!

A perfect source for a lipsync.

4. Take the rest of the high-res video of the "victim" and use it as an SRC for DFL. I trained the SAEHD model using Google Colab from scratch, and honestly, the results were impressive:

fake-deom.mp4

Status: verification passed

Final example. RealTimeVoiceCloning

When I was testing if the lipsync could be a problem, I decided to make a video where the lips are not showing what the voice is saying. In addition to that I decided to generate a completely artificial voice. The insider can have recordings of the victim’s voice but hardly would have the exact sentence the bank expects him to say. And because this article is about deepfake tools, I took the RealTimeVoiceCloning framework. After short learning on 10-15 mins of voice, I produced a decent quality voice fake saying whatever I wished:

test voice.mp4

As you hear, this voice sounds quite far from my original voice. In addition to that, I recorded a video with me saying different words from the words on audio:

The audio track is not synced with the video

So again, I wasn't sure if I would pass the verification.

It worked! It means the bank employees don’t have a reference for my voice. Whoever checks the recordings pays more attention to the video than the audio.

Status: verification passed

Reflections on the verification testing process.

I was lucky. I found testing conditions that allowed me to trigger verification requests and even fail or pass without affecting further tests directly. These conditions allow you to arrange trial-and-error tests – exactly what I needed.

Based on different timeouts of each verification, I came to the conclusion that humans did the final proof. Busy hours and off-work hours took longer than off-peak time for verifications. If checks are made only by humans, there's always a bias factor. The same video sent again and again eventually could be accepted.

When you create a deepfake, the facial match between the origin and the destination videos is important. As well as light conditions, quality of videos and even more minor details: what glasses do you wear, what fringe do you have? In my tests, I allowed myself some flexibility that criminals would not have.

The verification process is still a blackbox – I don’t know if all my tests cumulatively affected the outcome or not. And ff any previous recordings are available to the staff.

Overall, if you have enough time to produce something like that: https://www.youtube.com/watch?v=h1Rr9X5QuIk, ultimately, any verification done by humans, could be bypassed.

Recommendations for banks.

And pretty much to everyone who relies on photo/video/audio verifications:

  1. Control your environment! I don't know how to stress this enough, but this is the key! If your customer's iPhone is jailbroken, it is possible to modify every piece of information that goes from the phone to APIs, and there's no trust in this data.

2. Analyse content from every possible aspect. You should begin with an ML detection toolkits for deepfakes - that will give you an advantage. There're obvious pieces of evidence of modified videos, such as metatags and strings in the input. Some of them are less obvious: uneven gradient, blur, etc.

3. Collect your previous data and learn based on that data. Why does my recording suddenly change the voice? Why did I submit dozens of verification requests? Why my Device ID changes each time subsequently? Why does someone try to send the same video again and again, hoping to be accepted eventually?

4. With deepfakes becoming a commodity, you need to consider making your verification steps more complex. Some KYC providers require video calls facilitated by staff instead of simple recordings. They would ask you to do some unusual things – tilt your document, turn left, right. You can balance the technological advantage of your adversaries by deterring them with more complex checks.


These days it's fairly simple to create deepfake content not only for a proof of concept or a funny TikTok video. Using open source tools or commercial mass products, criminals with enough patience and even a shallow understanding of technology can bypass banks' restrictions.


Kudos:

Bo0om

Alezandra Murzina

Alezander Migutsky