This playbook shows you how to design, launch, and QA AI screening interviews that replace resume screens and recruiter phone screens without hurting candidate trust. The one thing to remember: treat AI screeners like structured interviews with audit trails and human oversight—not a black box. It’s written for hiring managers, founders, and recruiters who need throughput without sacrificing fairness or signal.
What AI screening interviews are
AI screening interviews are automated first-round interviews that happen before a human speaks to the candidate. They come in three main formats:
- Chat-based: a conversational assistant asks knockouts and simple scenario questions over SMS/WhatsApp/web chat. Great for high-volume hourly roles and schedule coordination. Tools like Paradox’s Olivia popularized this style. 1
- Voice: a live or phone-like conversation where the AI asks follow-ups, records audio, and generates a transcript and scorecard. This format approximates the recruiter phone screen and creates stronger “social commitment” than text-only flows. 2
- Video (async or live): candidates record answers to structured questions (one-way) or converse in real time. Enterprises often use AI-assisted video interviewing for scale. 3
What they replace: the resume pass and a 15–30 minute recruiter phone screen that mostly checks basic qualifications, compensation range, work authorization, availability, and a few “Why this role?” questions. Structured, upstream interviews deliver the same data with less scheduling drag and a cleaner audit trail. SHRM’s latest benchmarking pegs average U.S. time-to-fill around 44 days, a reminder that slow first steps ripple downstream. 4
What we saw in the current SERP for “AI screening interviews” (May 2026): top results define the category, pitch speed, and emphasize structured interviewing and fairness/compliance FAQs. Vendor pages from HireVue (video interviews + AI scoring), chat-led screening from Paradox, and interview assistant tooling from Metaview dominate, alongside explainers on legal/ethical risks from HBR and SHRM. Use that structure here: start with definition, move to benefits and risks, then give a setup and QA plan, and close with vendor selection. 3
See Raffi in 72 seconds
How Raffi runs the conversational AI interview — end to end. Same loop the article above describes.
Why operators adopt them
- Time and throughput. If a recruiter runs 30 screens per req at 20 minutes each plus 10 minutes of notes/admin, that’s 15 hours before a manager sees a shortlist. Multiply by 10 open reqs and you’re at 150 hours of screening time. Moving this step to an AI screen shortens time-to-decision and time-to-fill; SHRM’s average 44 days leaves plenty of room to recapture days in the earliest stage. Even a two-day acceleration at the top can pull the entire process left. 4
- Consistency and structure. AI screeners enforce question order, timeboxing, and scoring rubrics, so candidates are evaluated against the same criteria. Research summarized by HBR warns that unstructured or opaque algorithmic tools can introduce bias—but the inverse is also true: structured interviews, clear criteria, and transparency reduce noise. Design choices, not “AI” alone, determine equity. 5
- Compliance posture. In the U.S., the EEOC and DOJ have published guidance on AI in hiring, with special focus on disability discrimination risks and the need for accommodations. If you use automated interviews, you must disclose, provide alternatives on request, and monitor adverse impact. NYC’s Local Law 144 goes further, requiring annual bias audits and candidate notices when automated employment decision tools substantially assist hiring decisions. 6
- Candidate experience reality check. Surveys in 2025–2026 show that candidates don’t reject AI categorically—they reject opacity. Greenhouse reports that top abandonment triggers are undisclosed AI use and pre-recorded video scored by AI with no human present; candidates want disclosure, a simple explanation of what’s measured, and the option to request a human interview. Gartner’s 2025 survey also found only 26% of candidates trust AI to evaluate them fairly—again underscoring the need for transparent programs. 7
If you want the ROI math at the cost-per-hire level, see our internal benchmarks in Cost-per-hire benchmarks 2026.
How to set one up
Treat this as a structured-interview build with three layers: questions, scoring, and controls.
Step 1 — Write a crisp screening prompt
- Inputs: your JD, minimum qualifications, and the two must-have competencies for the first 90 days.
- Output: 5–7 questions that map 1:1 to those competencies. Use action-oriented prompts (“Walk me through…”) and one scenario per competency. Pull common starters from your own bank or from our interview question library.
Step 2 — Order the questions for early signal
- Lead with deal-breakers (work authorization, schedule, travel, shift constraints), then move to the highest-signal competency scenario, then motivation/role fit. Keep each response window short (60–120 seconds voice/video; 500–700 chars in chat).
Step 3 — Build the scoring rubric
- For each question, define 3–5 observable behaviors and a 1–4 scale with anchors. Example for “Handle an angry customer” (CS role):
- 4 = Acknowledges emotion, confirms policy, offers specific options, and proposes follow-up.
- 1 = Vague, no de-escalation, restates policy only.
Step 4 — Configure integrity and accessibility
- Integrity: enable browser focus checks, paste/keystroke heuristics, duplicate-answer flags, and inconsistent-locale alerts. Offer live-proctoring only where proportionate. Gartner reports ~6% of candidates admit to interview fraud; build guardrails, but don’t turn your screener into an exam. 8
- Accessibility: publish how the AI will be used and measured; allow a human alternative on request; provide time extensions or text-mode alternatives. This aligns with ADA guidance. 6
Step 5 — Decide escalation rules
- Examples: “Escalate if composite score ≥ 3.2/4 and no knockout fails,” or “Escalate for niche skill mentions even if score < 3.0, tag for human review.” Always retain a human override.
Step 6 — Pilot with 10–20 candidates
- Shadow-score 10 human-run screens against the AI rubric; reconcile gaps before rollout.
Implementation note: If you’d rather not build from scratch, skip the setup—Raffi ships with a pre-tuned 7‑question screener for common roles. Start free.
Quality assurance
Score what you can hear and verify—not vibes. A simple QA loop keeps the model (and your hiring bar) honest.
- What to listen for
- Content: role-specific facts, accurate terminology, and decision tradeoffs.
- Process: does the candidate structure answers (situation→action→result), ask clarifying questions, and quantify outcomes?
- Professional pragmatics: tone control, empathy in service roles, and stakeholder alignment in PM/lead roles.
- Transcripts and review hygiene
- Use transcripts to anchor evidence in your notes. If you rely on automatic speech recognition (ASR), remember error rates vary across accents and dialects; a 2020 study found commercial ASR had nearly double the word-error rate for Black speakers vs. white speakers. That’s a prompting and review policy issue—require humans to check low-confidence transcripts before rejecting. 9
- Real-time caption/transcript tools (e.g., Gemini Live’s post-session transcripts) help with calibration, but they’re not a substitute for reviewer accountability. Always link the score to transcript snippets. 10
- Scoring and drift checks
- Weekly: pull 20 random passes/fails near your threshold; re-score blind with a second reviewer. If >15% flip near the line, tighten rubric anchors or adjust threshold.
- Monthly: check subgroup pass rates (e.g., gender, race/ethnicity where lawful and available) and feature importance for your model. Where automated scoring is used, watch for automation bias—humans over-trusting AI recommendations—and enforce a written rationale for borderline rejections. 11
- When to retrain
- Any time your job content shifts (new stack, market, sales motion), or when pass rates move by >10% without a change in sourcing. Keep a change log for audit readiness (useful for NYC LL 144 environments). 12
Candidate experience
You can earn candidate trust with three simple commitments:
- Radical clarity up front
- Tell candidates they’ll do an AI-led screen, what it measures, how long it takes, and how a human will use the results. The Greenhouse 2026 data shows drop-off spikes when AI use isn’t disclosed or when pre-recorded video is scored with no human present; 46% want the option to request a human interview instead. Build that opt-out. 7
- Offer alternatives and accommodations
- Provide text-mode or slower-paced chat for speech impairments, and human-led alternatives upon request. The EEOC/DOJ ADA guidance is explicit here. 6
- Keep a human in the loop
- Make it clear a recruiter reviews borderline cases and all final decisions. HBR’s ethics primer is a good internal training read for interviewers and approvers. 5
Operationally, publish a short “What to Expect” page, send it with the invite, and include a retry policy for legitimate technical issues. If you’re using async video, remind candidates they can pause between questions—small UX details reduce anxiety. For a product walkthrough of the human review side, see How Raffi works.
Choosing a vendor
Here’s a quick orientation to three common choices teams compare:
| Vendor | Primary modality | Core use | AI scoring | Scheduling | Notes |
|---|
| HireVue | One‑way video + conversational AI + assessments | Enterprise video screening | Yes (with published explainability and validity statements) | Yes | Deep feature set for structured video and assessments. 3
| Paradox (Olivia) | Chat/SMS/WhatsApp | High‑volume chat apply, screening, scheduling | Chat-led screening | Excellent | Chat-first experience; great for apply-to-interview flow, not a native video screener. 1
| Metaview | Recorder + AI notes for human-led interviews | Interview note-taking/insights | N/A (summarization/insights) | N/A | Not a screener; enhances human interviews with AI notes and search. 13
Decision framework (use this to shortlist, then go deep on our comparisons: Raffi vs HireVue, Raffi vs Paradox, Raffi vs Metaview):
- Modality fit: do you need voice/video signal or is chat enough?
- Workflow control: can you set question order, timeboxes, and rubric anchors?
- Anti‑cheat + accessibility: is there proportional integrity monitoring and an ADA-compliant alternative path? 6
- Explainability: can reviewers see evidence-linked scores and re-score?
- Candidate trust: is disclosure baked into invites and landing pages?
- Compliance: bias audit posture if you hire in NYC (LL 144) or similar regimes. 12
Conversion CTA: See Raffi vs HireVue side‑by‑side.
Implementation timeline
Week 1 — Design
- Intake with hiring managers to define must-haves, nice-to-haves, and deal-breakers per role family.
- Draft 7-question screener and 1–4 rubric with anchors.
- Write candidate comms (invite, prep tips, privacy/AI use page) and escalation rules.
Week 2 — Pilot
- Shadow-run 10–20 screens alongside human phone screens.
- Calibrate scoring; tune question order and timeboxes.
- Configure integrity checks and ADA alternatives; test with assistive tech.
Week 3 — Rollout
- Launch to 1–2 roles; enable auto‑advance thresholds; instrument analytics for pass rates by source.
- Train reviewers on evidence‑based notes and how to override AI scores.
Week 4 — Scale
- Expand to adjacent roles; publish internal dashboard for TTF and funnel drop‑offs.
- Start monthly adverse impact reviews and a lightweight change log for prompts/rubrics.
Two visuals we recommend:
- Architecture diagram: “Interview → transcript → AI score → human review → decision”
- Ramp timeline (week 1–4)
How Raffi handles this
Raffi runs 7-question voice or video screeners that feel like a real conversation, then gives you an evidence-linked scorecard for each candidate. Under the hood, we run integrity checks (copy/paste, duplicate answers, off‑screen prompts) and an anti‑cheat score you can use as a nudge to escalate to a human follow‑up when something looks off. Reviewers see the transcript with highlights, the rubric rationale, and can re-score or override in one click.
We price on “candidate reveal” rather than seats—so you can invite the entire pipeline, then only pay to unlock the full reports for the ones you want to move forward. That keeps your top-of-funnel experimentation cheap and your downstream interviews focused. The whole flow is upstream of your ATS; when you’re ready, push shortlisted candidates into your normal process.
If you want to see it in action, start at How Raffi works. Ready to trial it on one role this week? Head to /raffi/start.
Frequently asked
What is an AI screening interview?
Are AI screening interviews legal in the U.S.?
Do candidates hate AI interviews?
How much time can AI screens actually save?
What about bias?
Do voice AI interviews disadvantage some speakers?
How do I disclose AI use to candidates?
What controls should I enable to deter cheating?
Do I need a bias audit?
Where do AI screens sit in my stack?
Sources
Every claim in this article links to a real public source.