An AI secretary that answers for you, turns away unwanted calls and only puts through the calls that matter
1. Problem statement
The phone has become a target: individuals receive on average more than 300 unwanted calls per year
(UFC-Que Choisir), voice fraud reports jumped by 113% in 2025 (ARCEP, the French telecom regulator),
and voice-cloning scams only need a few seconds of audio to imitate someone.
As a result, people simply stop answering unknown numbers — and genuinely important calls
(doctor, delivery, administration…) get lost in the middle of telemarketing.
The goal of this project was to build a complete service, designed from day one as a sellable product,
able to:
automatically answer on the user's behalf;
identify the caller and the reason for the call;
politely turn away telemarketing and scam attempts;
only transfer legitimate calls, with the context displayed before picking up.
This product-grade ambition imposed strong constraints: reliable screening in real-world conditions,
a multi-tenant architecture, online payments, and personal data protection.
2. The solution: Iris
Iris is an AI secretary that sits between the user's number and the outside world. When an unknown
number calls, she answers with a natural French voice and leads a real conversation: who is calling,
from which company, and for what reason.
known telemarketing and scam scenarios (fake bank advisor, training-credit or energy scams…) are recognised and politely turned away;
legitimate calls are transferred, and the user sees the name, company and reason on screen before even picking up;
whitelisted close contacts ring straight through, no questions asked;
blacklisted numbers are rejected immediately;
the user's voice is never exposed to a stranger — a safeguard against voice cloning;
everything is traced: history, transcripts and notifications, including turned-away and missed calls.
Two integration modes are offered: either the user hands out their Iris number (their real number
stays private), or they keep their usual number and enable simple call forwarding — screened calls
then reach them directly inside the app.
3. A complete product
Beyond the screening itself, the project covers the whole customer experience:
Mobile app
Dashboard with service status and statistics, call history with transcripts, whitelist and blacklist
management (with phone contacts import), a dedicated incoming-call screen showing the context collected
by Iris, and guided settings to activate the service.
Website
A bilingual (FR/EN) marketing site presenting the service with sourced statistics and animated diagrams,
plus a customer area to browse call history and manage the subscription and the account.
Subscription
Online subscription with a free trial, a self-service portal (payment method, invoices, cancellation)
and one-click account deletion, GDPR compliant.
4. Outcome
The system has been validated end to end in real conditions: a real incoming call is answered by Iris,
screened through conversation, then transferred — the app rings, the context is displayed, and the
conversation stays stable until hang-up.
complete screening: whitelist and blacklist, AI dialogue, transfer with context;
exhaustive history, including calls turned away without dialogue and missed calls with notifications;
full customer lifecycle: sign-up, trial, subscription, management and account deletion.
This is the most complete project I have built so far: it combines real-time telephony, conversational
AI, a backend, a multi-tenant database, a mobile app, a website, payments and self-hosted deployment.
The telephony part required three successive architectures before reaching a reliable call transfer —
a genuine engineering effort detailed in the technical solution.
1. Overall architecture
The system is built around a central backend that orchestrates three client surfaces
(mobile app, website, push notifications) together with telephony and AI services.
Overall architecture: the backend orchestrates telephony, the AI voice agent, the database and the client surfaces.
The project's guiding principle: the backend is the conductor. The conversational AI talks with the
caller, but every call-routing decision (reject, transfer, destination) is executed by the backend —
never delegated to a third-party service. This rule, learned from the most expensive lesson of the
project (see section 4), guarantees reliable, controlled behaviour.
The architecture is multi-tenant by design: each customer has their own dedicated number, and the
backend resolves the relevant account from the called number on every incoming webhook.
2. Tech stack
Telephony — Twilio Programmable Voice: numbers, TwiML webhooks, call bridging and the VoIP SDK.
Voice agent — Vapi.ai for real-time STT → LLM → TTS orchestration, with GPT-4.1 for the dialogue, Speechmatics for French transcription and Cartesia for the synthetic voice.
Payments — Stripe: Checkout, Customer Portal and webhooks.
Notifications — Expo / FCM push for every screened call.
Hosting — personal TrueNAS Scale server (Docker Compose), exposed through an outbound HTTPS tunnel.
3. The journey of a call
Entry and immediate triage
Every incoming call triggers a webhook to the backend, which identifies the customer then triages:
a blacklisted number is rejected immediately; a whitelisted number is transferred directly, without AI;
an unknown number is sent to the voice assistant. In that last case, the backend memorises the
identifier of the parent call leg — the keystone of the transfer (see section 4).
The screening dialogue
The assistant collects the name, the company and the reason, then calls an evaluation tool on the
backend. Three verdicts are possible: legitimate call (transfer announced then executed), call to block
(the assistant politely declines and hangs up), or ambiguous case — the assistant then decides on its
own according to its rules, with a deliberate philosophy: when in doubt, transfer. Better one call too
many than a family call rejected.
End of call and traceability
If the recipient doesn't answer, the caller hears a polite message and the user receives a
"missed call" notification — since the caller already gave their name and reason, voicemail becomes
unnecessary. The end-of-call report (transcript, duration, end reason) then completes the history.
Cascading safety nets guarantee that no call is ever orphaned: every call exists in the history,
even if the caller hangs up before the screening ends.
4. Telephony: keeping ownership of the call
The hardest part of the project: it took three successive architectures to reach a reliable
call transfer.
First dead end — handing over the number
The first version handed the number directly to the voice-agent platform. Simple, but the call then
"belongs" to the third-party service: it becomes impossible to redirect it ourselves, and the SIP
transfer protocol (REFER) is refused on a regular phone leg (PSTN). Any call-takeover strategy was
blocked by design.
Second dead end — the provider's warm transfer
The transfer mechanism offered by the voice-agent platform turned out to be broken: the destination
picked up in under a second, but the leg was systematically cut about three seconds later, without the
caller ever being bridged. The issue was proven by reading the carrier's call logs, reproduced on every
possible path, and turned out to be internal to the provider — therefore unsolvable on our side.
The final architecture
The number stays managed by our own telephony: our TwiML dials an authenticated SIP leg
(digest authentication) towards the voice agent. Two decisive consequences: the leg towards the AI
is a genuine SIP leg, and the parent leg remains a call we own — therefore freely redirectable.
The transfer then follows a call-takeover strategy (the official "call screening" pattern):
on entry, the backend memorises the parent call identifier in a short-lived in-memory registry;
the assistant screens the call — its SIP leg is only a child of our call;
on a transfer verdict, the backend takes the parent call back through the API and injects a new <Dial> scenario towards the destination;
the AI leg is hung up and the caller is bridged with the recipient: the bridge is 100% telephony, with no dependency on an experimental third-party mechanism.
Diagnosing this part required a close reading of SIP traces (Via headers, error codes,
authentication challenges) — real network investigation work, ultimately validated by real
end-to-end calls.
5. Backend and database
The Express backend exposes the telephony webhooks (incoming call, end of bridging), the tool webhooks
called by the assistant (evaluation, connection, end-of-call report), VoIP token delivery for
authenticated users and a health probe used by the app and the website.
Internal services are split by responsibility:
call classification (telemarketing/scam keywords, with an optional LLM classifier);
an in-memory registry of calls currently being screened — the core of the takeover strategy;
structured notifications and push;
a multi-tenant storage facade.
On the robustness side: every call to external services is protected with non-blocking degradation,
webhooks validate their inputs, and verdicts have cascading safety nets (immediate verdict →
end-of-call report → orphan-row creation → notification).
The Postgres database (Supabase) stores profiles, whitelist and blacklist, and the call history
(verdict, decision source, transcript, duration). Access security relies on Row Level Security:
an authenticated user can only read and write their own rows; the server key that bypasses those rules
is never exposed client-side; the history is read-only for clients. The schema evolved through
versioned SQL migrations, including a trigger that automatically creates the business profile
at sign-up.
6. Mobile app
The app (Expo / React Native, TypeScript) offers a dashboard, history with transcripts, list management
with phone-book import (numbers normalised to international format), a dedicated incoming-call screen
and settings — with a light/dark theme.
Receiving a call inside the app: the VoIP chain
In "full service" mode, the user's real number is forwarded to Iris: a classic GSM transfer would
create an infinite loop. Legitimate calls are therefore delivered directly inside the app through
the VoIP SDK:
the backend delivers a temporary access token to the authenticated user, and the app registers itself automatically at launch — zero configuration after installation;
with the app closed, a silent push notification wakes the native module, which rings the phone full screen, even when locked;
the call context (name, company, reason) travels with the call and is displayed on the ringing screen.
Among the pitfalls encountered: a microphone permission that is declared but not granted at runtime
produces a call that connects… in total silence, and a call invite that arrives before JavaScript
has started must be explicitly retrieved at launch.
Updates ship over the air (EAS Update): any JavaScript change is pushed in about a minute, with no
reinstall and no store review — native rebuilds are only needed when native modules change.
7. Website and payments
The Next.js website (App Router, server-side rendering) is fully bilingual FR/EN. The marketing pages
present the service with sourced real-world statistics and animated SVG diagrams of the call journey;
the customer area (cookie-based sessions) gives access to the dashboard, the call history and
account management.
The payment lifecycle relies on Stripe with a few structuring choices:
subscription via Checkout with a trial period, management (invoices, cancellation) via the Customer Portal;
the signed webhook synchronises state in real time, but the source of truth is a direct reconciliation with Stripe every time the subscription page is displayed — making the system immune to missed webhooks;
every "service active" display (website and app) is conditioned on the actual subscription state;
the app opens the browser already signed in, thanks to a single-use login link generated by the backend;
account deletion (GDPR) cancels the subscription, erases data in cascade and removes the authentication identity, in one click.
8. Deployment, security and personal data
The backend and the website are deployed as containers (Docker Compose) on my personal TrueNAS server —
the same one as the NAS project. Public exposure goes through an outbound HTTPS tunnel: no port is
opened on the home network, and certificates are managed automatically.
non-root containers with health probes;
secrets only in environment variables, never in the code or the repository;
accounts and history hosted in Europe;
short transcript retention and no raw audio stored;
strict per-user data isolation (RLS), per-user VoIP tokens and single-use login links;
data export and deletion built into the product.
The whole system now runs continuously and serves as the foundation for the planned next steps:
enriching the incoming-call screen (company verification, reason summary), live screening transcripts
in the app, and an iOS port.