SBC as an AI Voice Gateway: Architecture, Routing, and Security for Voice AI on the PSTN

AI voice gateway visual showing voice signal streams passing through a secure SBC gateway into an AI platform.

Voice AI in 2026 is delivered as a stack: a real-time speech-to-text engine, an LLM orchestrator deciding what to say next, and a text-to-speech engine speaking it back. Teams deploying voicebots, AI receptionists, agent copilots, and automated outbound campaigns can build that stack relatively quickly. However, the layer most teams skip is the one between the AI cluster and the phone network it is supposed to talk to.

That layer is a Session Border Controller (SBC). An SBC sits between two SIP networks, terminating signaling and media on each side independently, and is what lets a voice AI cluster behave like a normal SIP endpoint to a carrier and like a normal carrier to the AI cluster. In this article we cover what the SBC actually does in front of a voice AI platform: how encryption, codec, header normalization, and topology hiding work; how to use API-driven routing to ask an orchestrator which agent should answer; what STIR/SHAKEN looks like for AI-originated outbound; and why media quality matters more for a machine listener than it does for a human one.

Key Terms and Concepts

A quick-reference glossary for terms used throughout this article.

Voice AI platformThe cluster that runs speech-to-text, an LLM or rules-based orchestrator, and text-to-speech to handle a phone call autonomously. Examples range from packaged voicebot platforms to in-house stacks built on Asterisk, FreeSWITCH, or Kamailio.

STT (Speech-to-Text)The component that converts the caller’s audio into a text transcript in real time. Accuracy degrades quickly with jitter, packet loss, or transcoding artifacts on the media path.

TTS (Text-to-Speech)Speaks the orchestrator’s text response back to the caller in real time. Jitter buffer settings on the AI side affect how smooth that playback sounds.

Barge-inThe behavior that lets a caller interrupt the bot mid-sentence. It relies on voice activity detection and on a media path with low enough round-trip delay that the bot stops talking when the caller starts.

Media forkThe technique of copying RTP or SRTP packets to a second destination, such as a real-time analytics engine, an STT mirror, or a compliance recorder, without disrupting the primary call leg.

Agent orchestratorThe HTTP service that decides which voicebot, queue, or human agent should handle a given call, based on the caller, the called number, the time of day, attestation level, and any other business context.

API-driven SIP routingThe pattern where the SBC sends an HTTP query during INVITE processing to an external service and uses the response to choose the next hop. The alternative is a static route table updated out of band.

AttestationThe A, B, or C-level STIR/SHAKEN signal that tells terminating networks how confident the originator is in the calling number. Relevant for AI-originated outbound campaigns running through your SBC.

CodecThe encoding used for voice media on each leg: G.711 (PCMU/PCMA) is the safest default for AI bridges and is software-native on most SBCs; G.729, Opus, and AMR-WB usually require hardware transcoding.

B2BUA (Back-to-Back User Agent)An SBC architecture where the device fully terminates the SIP dialog on one leg and re-originates an independent one on the other. The detail behind this is covered in the SIP proxy vs B2BUA explainer; for a voice AI deployment, what matters is that the SBC can rewrite anything the carrier sent and present a clean SIP dialog to the AI.

NAP (Network Access Point)The TelcoBridges-specific term for a configured SIP peer. Each carrier, each voicebot endpoint, and each tenant on a multi-tenant deployment is typically its own NAP with its own encryption, codec, header, and routing settings.

Why Voice AI Needs an SBC at the PSTN Edge

The carrier side and the AI side of a voice AI deployment disagree about almost everything that matters on a SIP trunk: encryption posture, codec, header dialect, attestation, and IP reachability. The SBC is the device that lets them disagree gracefully.

Encryption boundary

A PSTN carrier may deliver SIP over UDP with unencrypted RTP, particularly on legacy interconnects. A voice AI platform almost always expects SIP over TLS and SRTP, because the cluster runs in a public cloud and the security review for letting plaintext voice into a tenant network would have failed a long time ago. The SBC terminates both transports independently per leg and converts between them transparently. The carrier never sees the AI cluster’s keys, and the AI cluster never sees the carrier’s plaintext. For the configuration mechanics, see the SBC TLS and SRTP configuration guide and the deeper SRTP technical reference.

Codec realities

Voice AI clouds vary in what they accept. Most accept G.711 (PCMU or PCMA) natively, because that is what the open-source SIP stacks they sit on top of negotiate by default. Some have Opus or G.729 endpoints, particularly when the platform was built mobile-first. ProSBC supports software-native G.711 ALAW and ULAW out of the box and supports Opus, G.729, and AMR-WB through hardware transcoding. The practical implication is straightforward: if the AI vendor accepts G.711, keep the call in G.711 end to end. Every transcoding hop adds artifacts that STT engines are bad at.

SIP normalization

Voice AI platforms commonly run on top of FreeSWITCH, Kamailio, or Asterisk, and they emit a SIP dialect that does not match a Tier 1 carrier’s. P-Asserted-Identity, From and PAI handling, session timers from RFC 4028, Diversion headers, and History-Info entries all need rewriting on the way in or on the way out. ProSBC’s SIP header manipulation engine is configurable per NAP, so the rules for a carrier-facing trunk group are independent from the rules for each voicebot endpoint. That separation is what makes it possible to swap a voice AI vendor without touching the carrier configuration.

Topology hiding

An AI cluster running in a public cloud has private container IPs, internal load-balancer addresses, and service-mesh host names that should never appear in Via, Contact, or Record-Route headers visible to a carrier. The SBC replaces those internal references with its own public address, eliminating both an information-disclosure risk and the routing failures that occur when a private RFC1918 address leaks into a public SIP path.

Reference architecture: ProSBC sits between the PSTN carrier and the voice AI platform. The carrier-facing leg handles whatever transport and codec the carrier supports; the AI-facing leg presents TLS, SRTP, and the codec the AI expects. During INVITE processing, the SBC queries an agent orchestrator over HTTPS to choose the next hop. On outbound, STIR/SHAKEN signing is applied before the call reaches the carrier. Click to enlarge.

Reference Architecture: PSTN, SBC, Voice AI Platform

Walk the diagram in both directions to see where the SBC carries the load.

Inbound call flow

A carrier INVITE arrives at the SBC’s public IP. ProSBC validates the source against its allow list, applies DoS protections, and normalizes the SIP message so the headers downstream consumers see are clean. The routing engine then runs an HTTP query against the agent orchestrator with the calling number, called number, P-Asserted-Identity, originating trunk group, and any business context the orchestrator needs. The response identifies the next hop, which might be a specific voicebot endpoint, a queue on a contact-center platform, or a human agent on a fallback PBX. The SBC re-originates a new SIP dialog toward the chosen destination, this time over TLS with SRTP media, presenting the call to the AI in the dialect it expects.

Outbound call flow

The AI platform places an outbound call: an appointment reminder, a callback, an agent-initiated follow-up. The INVITE lands on ProSBC’s AI-facing NAP. The routing engine looks up the appropriate carrier based on the called number, applies any per-tenant policy (more on multi-tenant below), and hands the call to the STIR/SHAKEN signing service over SIP. Once the Identity header is in place, the SBC re-originates toward the carrier with whatever transport that carrier supports.

Optional media fork

Many voice AI deployments want a second copy of the media for analytics, compliance recording, or a parallel STT mirror. ProSBC supports media playback and recording natively; real-time forking to an external streaming destination is partner-dependent. The pattern is worth knowing about, because asking the SBC to do it cleanly is usually simpler than asking the AI platform to multicast its own media.

API-Driven Routing at INVITE Time

The most underused capability when teams put a voice AI behind a phone number is using the SBC to ask the agent orchestrator, at call setup, where the call should go. Static route tables work for predictable carrier-to-bot mappings. They struggle when the bot platform wants to make routing decisions based on context the SBC does not have: caller identity, business hours, attestation level, tenant configuration, A/B test bucket, or whether a human agent is available.

ProSBC exposes its routing engine through a programmable filter chain. A before_filter on an inbound call can issue an HTTPS request to the orchestrator with the call’s parameters, wait up to a configured timeout (typically 500 to 2,500 ms so callers do not experience setup delay), and use the JSON response to choose the next hop, override attestation, or attach metadata that downstream systems consume. The mechanics, the redundancy patterns, and the failure modes are covered in depth in the SBC REST API call routing integration guide; the SBC API integration solution page describes the platform-level capability.

Use cases this enables for voice AI

VIP routing to a human identifies known high-value callers from CRM data during INVITE processing and routes them past the bot to a designated agent.
Anomalous-call routing sends suspicious traffic (unusual volume, mismatched ANI, low attestation) back through a fraud-scoring service before the call ever reaches the AI compute.
Language detection routing chooses an en-US, es-MX, or fr-FR voicebot endpoint based on the calling region, called number, or a quick orchestrator lookup against caller history.
Tenant-aware routing directs each customer’s traffic to its own bot configuration on a shared SBC, with per-tenant codec, attestation, and recording policy.
A/B testing of voicebot versions routes a controlled percentage of traffic to a candidate model and the rest to production, using the orchestrator response to record which bucket each call landed in.

Failure modes

The routing query needs an explicit timeout and an explicit fallback. Primary and secondary orchestrator URLs, a static route table as a final fallback, and a hard rule that a missed orchestrator response never blocks call completion are all baseline. The SBC carries the call; the orchestrator makes the decision. If the orchestrator is down, the SBC’s job is to still complete the call to a sensible default destination, not to hold the INVITE waiting.

STIR/SHAKEN for AI-Originated Outbound

Outbound calling at machine speed is a fraud vector and a regulatory exposure point. A misconfigured automation loop can place tens of thousands of calls before anyone notices, and a voice AI platform with a single misbehaving tenant can pull down attestation for everyone else sharing the same upstream carrier. The FCC’s own-certificate rule, effective September 2025, means the voice service provider that operates the platform is on the hook for the attestation level it signs at. That changes the math for anyone running an outbound voicebot.

How the SBC handles signing

ProSBC integrates with TransNexus ClearIP and Neustar over SIP for both signing and verification, which is the production-deployed pattern for every TelcoBridges customer running STIR/SHAKEN today. On an outbound call, ProSBC routes through a signing NAP whose service_type is set to AUTHENTICATION, receives the signed Identity header back in a SIP 302 response, attaches it to the outgoing INVITE, and continues to the carrier. The signing service stays out of the media path entirely. Redundancy is expressed through route ordering and Reason Cause Mapping, so a signing-service outage never blocks call completion.

Per-call attestation matters for voice AI

Blanket A-level signing of every bot-originated call is the wrong default for a platform that mixes retail traffic (verified business calling its own customers), wholesale traffic (a reseller passing calls through), and unmanaged tenant traffic (high-risk outbound campaigns). The right pattern is to decide attestation per call based on the originating tenant, the called destination, and known caller-to-number authorization. A programmable SBC makes that decision part of the routing logic instead of a per-trunk static setting. The A-level attestation guide covers the KYC and number-authorization requirements that need to be true on the business side for A-level to hold.

Pair attestation with toll-fraud limits

A single misconfigured automation loop has generated five-figure carrier bills in the field. Per-NAP concurrent-call ceilings, per-source CPS limits, and integration with a fraud detection partner (TransNexus, SecureLogix, YouMail) are inexpensive insurance. The STIR/SHAKEN solution page covers the broader product context.

Media Quality Matters More for AI Than for Humans

A human caller tolerates 80 ms of jitter and a short burst of packet loss without thinking about it. A speech-to-text engine notices both. STT word-error rate climbs fast with jitter, packet loss, codec transcoding artifacts, and aggressive echo cancellation, and a bot that mishears the caller will respond in ways that look like the AI failed even though the failure was in the transport.

The metrics that matter

Different parts of the AI pipeline care about different metrics. STT is sensitive to jitter and packet loss because both produce dropped or distorted phonemes. Barge-in is sensitive to round-trip delay and post-dial delay, because the bot needs to detect the caller starting to speak and stop talking within roughly 200 ms to feel natural. TTS playback is sensitive to jitter buffer settings on the AI side. ProSBC exposes per-call MOS scoring, CDR output, SIP call trace, and live Wireshark capture so the data exists to correlate quality complaints with the leg they came from. The VoIP monitoring best practices guide covers the metric thresholds and the SBC data streams in more depth.

Codec choices for AI bridges

G.711 native is the safest default when the AI vendor accepts it. Transcoding from Opus on the AI side to G.711 on the carrier side adds artifacts that STT models were not trained on, and the cost shows up as a measurable accuracy drop in production. When the voice AI vendor publishes a supported codec list, treat it as a routing input rather than an assumption. Configure the NAP for each AI endpoint with the codec preference it actually wants and let the SBC negotiate accordingly.

Multi-Tenant Deployments for Voice AI Platforms

Platforms running voice AI as a service look a lot like CPaaS operators. The SBC pattern looks similar too. One ProSBC instance can host up to 1,024 NAPs, which is enough for several hundred tenants when each tenant gets its own carrier-facing and bot-facing NAP plus any specialty trunks. Per-tenant CDR streams make it possible to bill and report independently. Per-tenant attestation policy is important because a SaaS bot platform may have one tenant running A-level retail traffic and another doing C-level high-volume outbound from the same compute footprint; treating them identically is what gets the wholesale carrier to start filtering. The architectural pattern is essentially the same as the one described in the multi-tenant Teams Direct Routing guide, with the AI cluster substituted for the tenant’s Microsoft 365 environment. The CPaaS and SBCs solution page describes the broader pattern.

Security at the AI Gateway Boundary

The SBC in front of a voice AI platform is internet-facing on the carrier side and exposed to whatever the AI cloud’s network can produce on the other. Both surfaces matter.

Carrier-side protections

SIP-aware rate limiting, dynamic blacklisting with percentage-based greylisting, SIP registration scanning protection, and built-in DoS and DDoS mitigation are the baseline. The SBC security overview covers the layered approach; the SIP DoS attack prevention guide covers the specific flood types and how the SBC catches them before they reach the AI cluster.

AI-side protections

A misbehaving automation does not need to be malicious to be expensive. Per-NAP concurrent-call ceilings on the AI-facing trunk group contain runaway loops to a predictable maximum. Mutual TLS between the SBC and the AI platform raises the bar on connection authentication, particularly when the AI cluster is in a separate cloud account or operated by a different team than the SBC.

Observable behavior

Most voice AI incidents look like quality problems at first and turn out to be configuration problems on closer inspection. The SBC’s CDR, SIP trace, and live capture tooling is where most of those investigations end. Wiring those data streams into the same observability platform the rest of the voice infrastructure already feeds is the prerequisite for catching issues before customers do.

Frequently Asked Questions

Does ProSBC do speech-to-text or text-to-speech?

No. ProSBC is the SIP and media gateway. The voice AI platform handles STT, the LLM or rules-based orchestrator, and TTS. ProSBC handles transport, encryption, codec, attestation, routing, and observability for the call that connects them.

Can the SBC route between a human agent and a voicebot in the same call?

Yes. The routing engine can transfer or re-INVITE based on orchestrator decisions, and the SBC enforces the SIP and media side of the handover. A common pattern is bot-first answering with escalation to a human when the orchestrator detects intent it cannot handle.

Do I need transcoding for voice AI?

It depends on what the AI platform accepts. If both legs run G.711, no transcoding is needed and that is the safest default for STT accuracy. ProSBC supports G.711 ALAW and ULAW natively in software; Opus, G.729, and AMR-WB are supported through hardware transcoding today.

How does the SBC help with STIR/SHAKEN for AI-originated outbound?

ProSBC integrates with a signing service such as TransNexus ClearIP or Neustar over SIP, attaches the signed Identity header to outbound INVITEs, and lets the routing logic choose the attestation level per call based on tenant and destination. The pattern is the same as the one used for traditional carrier outbound; the difference is that per-call attestation matters more when the originator is a multi-tenant platform.

Can I sit ProSBC in front of an in-house voice AI built on FreeSWITCH, Asterisk, or Kamailio?

Yes. That is a common deployment pattern. ProSBC normalizes the SIP between an in-house AI stack and the carrier, and it is platform-agnostic. The AI platform can keep its native SIP dialect on the inside and the carrier sees a clean, carrier-grade SIP dialog on the outside.

Is there a free way to test this before committing?

Yes. ProSBC Lab is a permanently free 3-session license, self-serve in roughly 20 minutes, with full configuration access. The 30-day trial provides 500 concurrent sessions and is also self-serve.

Conclusion

Voice AI does not change what an SBC does. It raises the stakes on the things an SBC has always done. Encryption, header normalization, attestation, programmable routing, and observability all become more load-bearing when the entity on the other end of the SIP trunk is a machine handling thousands of conversations a day and the regulatory line on attestation is being drawn around your provider account.

The architectural choice is between treating the SBC as a passive translator that the AI happens to sit behind, and treating it as the configurable transport-and-trust layer that lets a voice AI platform behave like a real carrier endpoint. The second framing is what makes a deployment hold up at production volume.

Deploy a Voice AI Gateway with ProSBC

ProSBC is a carrier-grade software SBC built on more than twenty years of SIP deployment experience. It is the configurable gateway that carries voice AI traffic, not an AI product itself, and it covers the functions this article walks through directly.

The B2BUA architecture provides independent TLS and SRTP configuration per trunk group, so the AI-facing leg and the carrier-facing leg negotiate transport independently. The SIP header manipulation engine is configurable per NAP, with up to 1,024 NAPs on a single instance for multi-tenant voicebot platforms. The programmable routing engine supports HTTP queries against an agent orchestrator during INVITE processing, with primary and secondary URLs and explicit fallback. STIR/SHAKEN signing and verification integrate with TransNexus ClearIP and Neustar over SIP, with per-call attestation decided in routing logic. DoS protection, dynamic blacklisting, SIP registration scanning protection, and per-NAP concurrent-call ceilings are included.

ProSBC deploys natively on AWS, Microsoft Azure, VMware, KVM and Proxmox, and bare metal, so it can sit next to the voice AI cluster regardless of where that cluster runs. Cloud-native and virtualized deployment options cover the most common production patterns.

Prefer to evaluate on your own first? Start your 30-day free trial.