3 seconds of audio can clone a voice. A few risky words can move money. Here is what stops both during a live call.
Passphrase Protocol · Callback Verification · Dual Authorization · Real-Time Detection · Recovery
· Vicall Research Team
These statistics are documented from FBI IC3 reports, FTC data, academic research, and Keepnet Labs. They represent the current state of voice clone fraud as of 2025.
For generations, recognizing someone's voice was a reliable identity check. That assumption is no longer valid. AI voice cloning has made voice impersonation cheap, fast, real-time, and indistinguishable from the real speaker to the human ear. Every verification protocol that relied on "I recognize that voice" must be rebuilt.
48% accuracy is a coin flip. Research consistently shows humans detect deepfake audio at roughly 48% accuracy on phone-quality audio — no better than random chance. This is not a training failure. The human auditory system evolved for a world where synthesized voices did not exist. Awareness and vigilance cannot solve a signal-level problem. Technology must.
These controls are layered — not alternatives. The strongest protection combines all five. Each addresses a different point in the attack chain, and each compensates for gaps in the others. The only control that operates during the live call itself is Control 5.
Establish a random, nonsensical secret passphrase face-to-face — never digitally, never via text or email or phone. The passphrase should be absurd enough to be unmemorable outside the context of its use: a random combination of words that forms no coherent sentence. Write it down and store it physically.
The protocol: Any call requesting a financial action, wire transfer, or sensitive disclosure must begin with the passphrase before any action is taken. The caller provides the passphrase without being prompted. If they can't produce it, the call is treated as fraudulent regardless of how convincing the voice sounds.
Why it works: No AI has access to information that was never stored digitally. A passphrase established face-to-face and never typed or spoken outside that context cannot be retrieved by an attacker who has only accessed your digital records, social media, or data breach databases.
Rotate passphrases periodically — quarterly is sufficient for most organizations. Immediately rotate if you suspect a compromise.Hang up. Call back. For any call requesting a sensitive action, terminate the call and initiate a new outbound call to the person through a number from your verified internal directory — not the number that appeared on your screen during the suspicious call.
Critical detail: Never use a number provided during the suspicious call itself. The attacker controls that number and will answer it. Your verified internal directory is the only source for callback numbers. If the person is not reachable through the directory number, wait — do not proceed with the action based on the original call.
Combine with the passphrase: When you reach the person via callback, require the passphrase before taking any action. This creates a two-factor human verification: independent channel + secret information.
This single control stops the majority of vishing attacks because the attacker cannot control the callback number in your directory.Any wire transfer instruction received by phone must be confirmed via a completely separate communication channel before the transfer is processed. Acceptable OOB channels: a separate email from a verified address to a verified address, an in-person confirmation, or a secure messaging platform established in advance.
Pre-agreed OOB codes: For executives who frequently authorize time-sensitive transfers, establish a system of out-of-band verification codes in advance — short numeric codes that the authorizing executive includes in their OOB confirmation. These codes are established face-to-face or through a secure channel already verified on a prior occasion.
The rule: No wire transfer is processed based on a phone call alone, regardless of who appears to be calling. The phone call can initiate the request, but a separate OOB confirmation closes it.
This stops all forms of voice-based BEC and CEO fraud — the attacker cannot control both the phone call channel and your verified email simultaneously.Two people from separate roles — the authorizing executive and the processing employee — must independently authorize any wire transfer. The second authorizer must contact the requester through their own independent channel, not the same call thread or email thread that initiated the request.
The second authorizer calls the executive directly through the verified directory, reaches them independently, and receives confirmation independently — in their own words, not forwarded from the original channel. Both authorizations are documented before the transfer is processed.
Why two is the minimum: An attacker controlling one channel — the call — cannot simultaneously control an independent second call from a different person who initiated contact through the directory. Two-person authorization requires the attacker to compromise two independent contact channels simultaneously, which is operationally infeasible for most attacks.Controls 1–4 are process controls — they require the employee to remember a protocol and execute it correctly under social pressure from a convincing impersonator. They work, but they depend on human compliance. Control 5 operates during the live call itself.
Vicall analyzes spectral artifacts, prosodic anomalies, and codec fingerprints in the incoming audio stream — markers that distinguish AI-generated speech from natural human speech. It can also alert on configured social-engineering phrases such as routing number, account number, wire transfer, password, verification code, gift card, payroll change, or other terms your team should not handle casually on a call.
Verdict: REAL VOICE, SYNTHETIC DETECTED, or a policy phrase warning — delivered while the call is still happening. On-device: no audio is ever sent to the cloud and no transcript is stored. iOS uses CoreML; Android uses ONNX Runtime; landline deployments use an on-premises Mac mini.
No voiceprint enrollment required. Detection is based on whether the audio is synthetic and whether configured risk phrases appear — not whether the voice matches a specific person on file. Protection starts from the first call with no contact setup and no onboarding friction.If something feels off — or if Vicall flags the call as synthetic — follow these steps immediately. You do not need to be certain. The cost of interrupting a legitimate call is minutes. The cost of completing a fraudulent one can be millions.
Wire fraud recovery is a race against time. The FBI's Financial Fraud Kill Chain has a 66% success rate within 72 hours — and approaches zero after a week. These steps must happen in order, immediately, with no delay between them.
The 72-hour window is real. After 72 hours, funds moved through domestic wire typically reach international accounts or are converted to cryptocurrency. Once converted, tracing and recovery become nearly impossible. Law enforcement recovery rate for wire fraud drops from 66% in the first 72 hours to under 5% after one week. Report to IC3 immediately — do not wait until Monday, do not wait for internal approvals.
Voice clone attacks concentrate where phone calls can authorize money movement, credential sharing, or irreversible action. These are the highest-risk profiles by sector, with the specific attack patterns each faces.
Beyond organizational protocols, several technical controls operate at the network and carrier level. Understanding their capabilities and limitations is essential for a complete defense architecture.
Every question security teams, executives, and IT providers ask about voice clone fraud protection — answered directly.
Modern AI voice cloning models require as little as 3 seconds of audio to produce a convincing clone. Sources include YouTube videos, LinkedIn intro clips, earnings call recordings, voicemails, podcast appearances, or any other publicly available audio. The target does not need to be a public figure — anyone whose voice appears in any recording is potentially clonable. The clone is operational in minutes using consumer cloud tools.
No, not reliably. Research shows humans detect deepfake audio at approximately 48% accuracy — essentially a coin flip. Phone audio is already compressed and degraded by the codec, which further masks synthetic artifacts. Even trained security professionals fail to identify high-quality voice clones in real-world conditions. The human auditory system evolved for a world where synthesized voices did not exist. Technology-based detection is required — this is not a training problem that awareness can solve.
No. STIR/SHAKEN authenticates the calling phone number — it does not analyze the voice audio itself. An attacker can use a legitimate, STIR/SHAKEN-authenticated VoIP number while transmitting a fully cloned voice. Additionally, STIR/SHAKEN authentication breaks down on non-IP segments of the call path — traditional PSTN trunks — and the FCC documented this gap in April 2025. Caller ID authentication and voice authentication are entirely separate problems requiring separate solutions.
The passphrase method is an FBI-recommended defense against voice clone fraud. Establish a random, nonsensical secret phrase face-to-face — never digitally, never texted or emailed. Any call requesting a wire transfer or sensitive action must include the passphrase before any action is taken. The caller provides it without being prompted. No AI has access to information that was never stored digitally — so no voice clone, however convincing, can produce a passphrase that only existed in a physical face-to-face conversation.
Potentially — but only if you act immediately. The FBI's Financial Fraud Kill Chain has a 66% success rate when activated within 72 hours of the transfer. Call your sending bank immediately to request a recall, then call the receiving bank with full account details. File an FBI IC3 report at ic3.gov — this activates FFKC coordination for transfers over $50,000. File an FTC report at reportfraud.ftc.gov. Recovery probability drops sharply after 72 hours and approaches zero after one week.
The Financial Fraud Kill Chain (FFKC) is an FBI-coordinated rapid response program that connects law enforcement with financial institutions to halt or reverse fraudulent wire transfers. It is activated through an FBI IC3 complaint for losses over $50,000. The FFKC has a 66% success rate when activated within 72 hours of the transfer. After 72 hours, funds typically move through multiple accounts or are converted to cryptocurrency, making recovery increasingly difficult.
Vicall analyzes spectral artifacts, prosodic anomalies, and codec fingerprints in the incoming audio stream — markers that distinguish AI-generated speech from natural human speech. These markers are imperceptible to the human ear but measurable by trained AI models. Vicall runs continuously throughout the call — not just at the start — catching mid-call voice switches where an attacker opens with a real human voice before switching to a clone. Detection verdict appears in under one second. No voiceprint enrollment required.
No. Vicall runs entirely on-device — CoreML on iPhone, ONNX Runtime on Android, and a local on-premises Mac mini for landline deployments. No audio is ever transmitted to a cloud server for analysis. Inference happens locally, which means detection works regardless of network conditions, eliminates cloud privacy risk entirely, and produces verdicts in under one second without round-trip latency.
Vicall is the only control that catches voice clone fraud during the live call itself — before any action is taken. On-device. No enrollment. No audio to the cloud. Under one second.