What is SRTP? Secure Real-Time Transport Protocol Explained

SRTP encryption protecting VoIP media streams between endpoints

Every VoIP call travels as a stream of small data packets across the network called RTP. By default, RTP (Real-time Transport Protocol) packets transmit voice and video unencrypted. Anyone with access to the network path can capture those packets and hear the entire conversation. SRTP fixes that problem.
SRTP (Secure Real-time Transport Protocol) is a security profile for RTP that adds encryption, message authentication, and replay protection to voice and video media streams. Defined in RFC 3711 and published by the IETF in 2004, SRTP was developed by cryptographic engineers at Cisco and Ericsson specifically for the constraints of real-time communications: it needs to be fast, tolerate packet loss, and add minimal overhead.
If you are a VoIP infrastructure operator, SRTP is no longer optional: Microsoft Teams Direct Routing requires it, WebRTC mandates it, and healthcare and financial services compliance frameworks demand it. This guide explains what SRTP does, how it works under the hood, and what role a Session Border Controller (SBC) plays in deploying it across real-world networks.

Key Terms and Concepts
A quick-reference glossary for terms used throughout this article.
SRTP (Secure Real-time Transport Protocol)A security profile for RTP defined in RFC 3711 that adds AES encryption, HMAC-SHA1 authentication, and replay protection to voice and video media streams. Adds approximately 2% bandwidth overhead with negligible latency impact.
RTP (Real-time Transport Protocol)The standard protocol for delivering voice and video over IP networks. RTP carries media payloads in UDP packets but provides no encryption, authentication, or replay protection. SRTP is its secured counterpart.
AES-CM (Advanced Encryption Standard, Counter Mode)The default cipher used by SRTP. AES in Counter Mode turns the block cipher into a stream cipher, allowing each packet to be decrypted independently without requiring sequential processing. This is critical for VoIP, where packets routinely arrive out of order or are lost.
HMAC-SHA1Hash-based Message Authentication Code using SHA-1. SRTP appends an HMAC-SHA1 tag (80-bit or 32-bit) to each packet so the receiver can verify the packet has not been tampered with or forged. Packets with invalid tags are discarded.
SDES (Session Description Protocol Security Descriptions)A key exchange method where the SRTP master key is embedded directly in the SDP body of SIP signaling messages. Simple and widely supported, but requires TLS on the signaling path to protect the key from exposure in cleartext.
DTLS-SRTP (Datagram Transport Layer Security for SRTP)A key exchange method where a DTLS handshake is performed directly on the media path to establish SRTP keys. More robust than SDES because it does not depend on signaling-layer encryption. Mandatory for WebRTC.
SRTCP (Secure RTCP)The secured version of RTCP (RTP Control Protocol). SRTP’s key derivation function generates separate keys for SRTCP, protecting the control channel alongside the media stream.
TLS (Transport Layer Security)The encryption protocol used to secure SIP signaling, typically on port 5061. TLS protects call setup metadata, routing information, and (when using SDES) the SRTP master key exchange. SRTP and TLS protect different data flows; both are required for complete VoIP security.
B2BUA (Back-to-Back User Agent)An SBC architecture where the device fully terminates the incoming SIP dialog and re-originates a new, independent one on the other side. This gives the SBC complete control over signaling and media on each leg, enabling independent SRTP negotiation, RTP-to-SRTP conversion, and topology hiding.
RTP-to-SRTP ConversionThe process where an SBC accepts plain RTP from a legacy endpoint, encrypts it into SRTP, and forwards it to an SRTP-mandatory destination. On the return path, the SBC decrypts SRTP back to plain RTP. This bridges legacy voice infrastructure with modern encrypted platforms without equipment replacement.

Why VoIP Media Encryption Matters

Standard RTP was designed for efficiency, not security. It carries no encryption, no integrity verification, and no protection against replay attacks. In practice, this means:
Eavesdropping is the most direct risk. An attacker who captures RTP packets on a shared network, Wi-Fi segment, or cloud infrastructure link can decode the audio using freely available tools like Wireshark. The voice data is right there in the packet payload.
Packet injection becomes possible because RTP has no authentication. An attacker can inject fabricated packets into an active RTP stream, and the receiving endpoint has no way to distinguish legitimate packets from forged ones.
Replay attacks happen when an attacker records RTP packets and retransmits them later, potentially disrupting active calls or replaying sensitive conversation fragments.
Man-in-the-middle (MitM) attacks allow an attacker positioned between two endpoints to intercept, modify, or redirect RTP streams without either party knowing. Without encryption and authentication, there is no mechanism for the endpoints to verify that packets are arriving unaltered from the expected source.
These are not theoretical risks. VoIP traffic regularly crosses the public internet between SIP trunk providers, cloud platforms, and remote endpoints. Any hop along that path is an exposure point. And as organizations move to hybrid and remote work models, voice traffic increasingly traverses networks that the organization does not control.
Beyond security hygiene, several compliance and platform requirements now mandate media encryption:

  • Microsoft Teams Direct Routing requires SRTP for all media. An SBC that cannot negotiate SRTP will not pass Teams certification.
  • WebRTC mandates DTLS-SRTP. No browser-based communication platform works without it.
  • HIPAA requires encryption of electronic protected health information (ePHI) in transit, which includes voice communications carrying patient data.
  • PCI DSS mandates encryption for payment card data, including spoken card numbers over VoIP.
  • FCC guidance increasingly references media-layer encryption as part of VoIP network security best practices.

How SRTP Works

SRTP sits on top of RTP. It takes a standard RTP packet, encrypts the payload, appends an authentication tag, and sends it over the same UDP transport that RTP uses. The receiving side verifies the authentication tag, decrypts the payload, and passes the media to the application. The process adds minimal latency because SRTP uses stream cipher modes designed for real-time tolerance.

Encryption: AES in Counter Mode

SRTP uses the Advanced Encryption Standard (AES) as its default cipher. Specifically, it operates AES in Counter Mode (AES-CM), which turns the block cipher into a stream cipher. This is a deliberate design choice for real-time media.
Counter Mode generates a keystream by encrypting a sequence of counter values. The keystream is then combined with the plaintext payload bit by bit to produce ciphertext. Because each counter value is independent, the receiver can decrypt any packet without needing to have received all previous packets first. This is critical for VoIP, where packets routinely arrive out of order or get dropped entirely. A cipher mode that required sequential decryption would break under normal network conditions.
The two standard crypto suites defined for SRTP are:

  • AES_CM_128_HMAC_SHA1_80 (128-bit AES key, 80-bit authentication tag)
  • AES_CM_128_HMAC_SHA1_32 (128-bit AES key, 32-bit authentication tag)

The 80-bit variant provides stronger authentication and is the default for most deployments.

Message Authentication: HMAC-SHA1

Encryption alone does not prevent tampering. An attacker could flip bits in the encrypted payload, and the receiver would decrypt it into corrupted audio without knowing the data had been modified.
SRTP addresses this with HMAC-SHA1 (Hash-based Message Authentication Code using SHA-1). For each packet, SRTP computes an HMAC over the RTP header and the encrypted payload, then truncates the result to either 80 bits or 32 bits. This authentication tag is appended to the packet. The receiver recomputes the HMAC independently and compares it. If the tags do not match, the packet is discarded.
This protects against both packet modification and packet injection. An attacker cannot forge a valid authentication tag without the secret key.

Replay Protection

SRTP maintains a replay list (a sliding window of recently received packet sequence numbers). If a packet arrives with a sequence number that has already been seen, or one that falls outside the acceptable window, the packet is dropped. This prevents an attacker from recording encrypted packets and retransmitting them to disrupt or confuse a call.

Key Derivation and Key Exchange

SRTP uses a key derivation function (KDF) to generate multiple session keys from a single master key. The master key produces separate encryption keys, authentication keys, and salting keys for both the SRTP stream and its companion SRTCP (Secure RTCP) stream. This means the key management protocol only needs to deliver one master key per session.
The master key itself is exchanged during call setup, through the SIP signaling layer. Two mechanisms are common:
SDES (Session Description Protocol Security Descriptions) embeds the master key directly in the SDP body of the SIP INVITE and 200 OK messages, inside a crypto attribute. This is the simpler approach and is widely used when the SIP signaling itself is encrypted with TLS. Without TLS, SDES exposes the master key in cleartext signaling, which defeats the purpose.
DTLS-SRTP (Datagram Transport Layer Security for SRTP) performs a DTLS handshake directly on the media path to establish keys. This is more robust because the key exchange does not depend on signaling-layer encryption. DTLS-SRTP is mandatory for WebRTC and is increasingly adopted in enterprise VoIP.

SRTP vs. RTP: What Changes?

Property RTP SRTP
Payload encryption None AES-128 (Counter Mode)
Packet authentication None HMAC-SHA1 (80-bit or 32-bit tag)
Replay protection None Sliding window on sequence numbers
Key management Not applicable KDF from master key; SDES or DTLS-SRTP exchange
Bandwidth overhead Baseline ~2% increase (authentication tag + potential MKI field)
Latency impact Baseline Negligible (stream cipher, no round-trip handshake per packet)
Packet loss tolerance High Equally high (Counter Mode allows independent decryption)
Transport UDP UDP (same ports, same paths)

The ~2% bandwidth overhead comes from the authentication tag appended to each packet (4-10 bytes depending on configuration) and the optional Master Key Identifier (MKI) field. For a G.711 call using 20ms packetization, this translates to roughly 1.6 kbps additional bandwidth per direction. In practice, this is imperceptible.
SRTP was specifically engineered to add security without degrading real-time performance. It reuses the same UDP transport, the same port allocations, and the same RTP header structure. Network infrastructure that routes RTP traffic will route SRTP traffic identically.

TLS + SRTP: Complete VoIP Security

A common misconception is that TLS on SIP signaling alone secures VoIP. It does not. A VoIP call has two distinct data flows that each require their own protection:
Signaling (SIP) covers the SIP messages that set up, modify, and tear down calls. These contain called/calling numbers, IP addresses, SDP with codec and key information, and routing metadata. SIP signaling is encrypted with TLS (Transport Layer Security), typically on port 5061.
Media (RTP/SRTP) is the actual voice or video packets. These are encrypted with SRTP on dynamically negotiated UDP ports.
If you encrypt signaling with TLS but leave media as plain RTP, an attacker cannot see who called whom, but can still capture and listen to the conversation. Conversely, if you encrypt media with SRTP but send SIP in the clear, an attacker can see the SDES key exchange and decrypt the media anyway.
Both layers must be encrypted together. TLS protects the signaling. SRTP protects the media. And the device that enforces both at the network boundary is the Session Border Controller.

The SBC’s Role in SRTP

In most real-world VoIP deployments, the SBC sits at the network edge between your internal voice infrastructure and external networks: SIP trunk providers, Microsoft Teams, remote users, or peering partners. The SBC is where encryption policy gets enforced.
An SBC operating as a Back-to-Back User Agent (B2BUA) fully terminates and re-originates both signaling and media on each side of the connection. This architecture gives it three capabilities that are critical for SRTP deployment:

SRTP Relay

When both sides of a call support SRTP, the SBC relays encrypted media between them. The SBC negotiates the crypto parameters (cipher suite, keys) independently on each leg, decrypts incoming SRTP, and re-encrypts it for the outbound leg. This maintains encryption while still allowing the SBC to enforce media policy, perform SIP normalization, and generate call detail records.

RTP-to-SRTP Conversion

This is where the SBC becomes essential for mixed environments. Many legacy IP-PBXs, older SIP phones, and some SIP trunk providers only support plain RTP. Modern platforms like Microsoft Teams and WebRTC applications require SRTP.
The SBC bridges this gap by accepting plain RTP from the legacy side, encrypting it into SRTP, and forwarding it to the SRTP-required destination. On the return path, the SBC decrypts SRTP back to plain RTP for the legacy endpoint. The legacy equipment never needs to change.
This RTP-to-SRTP conversion is what allows organizations to connect existing voice infrastructure to Teams Direct Routing, cloud contact centers, and other platforms that mandate encryption, without replacing hardware or retraining staff.

Topology Hiding with Encryption Context

The SBC hides internal network IP addresses from external parties. Combined with SRTP, this means external entities see neither the internal network topology nor the media content. The SBC presents its own IP addresses to the outside world and manages separate SRTP crypto contexts for each leg of the call.

When Do You Need SRTP?

The short answer: always, when feasible. Media encryption has negligible performance cost and significant security benefit. But certain scenarios make it mandatory rather than recommended.
Microsoft Teams Direct Routing requires SRTP for all media. If your SBC does not support SRTP negotiation and RTP-to-SRTP conversion, Teams calls will fail.
WebRTC applications all use DTLS-SRTP. If you are integrating browser-based communication with your voice network, SRTP is not optional.
Healthcare (HIPAA) treats voice calls carrying patient information as ePHI in transit. SRTP provides the encryption layer needed to meet the HIPAA Security Rule’s transmission security requirements.
Financial services (PCI DSS) mandates encryption for payment card data, including spoken card numbers over VoIP. SRTP protects the media leg.
Remote and hybrid workers present a growing exposure point. When VoIP traffic leaves the corporate network and traverses the public internet to reach remote employees, encryption prevents interception on uncontrolled network segments. SBCs with SIP/TLS and SRTP support protect these connections at the access edge.
Any traffic crossing the public internet should be encrypted. If voice packets traverse any network segment you do not physically control, encryption is a baseline security measure.

Common SRTP Misconceptions

SRTP replaces TLS is a common assumption, but incorrect. SRTP encrypts media. TLS encrypts signaling. They protect different data flows and both are needed for complete VoIP security. Without TLS, the SDES key exchange in SIP signaling exposes the SRTP master key in cleartext.
SRTP adds significant latency is another misconception. AES Counter Mode is a stream cipher that processes each packet independently without requiring feedback from previous packets. Research shows approximately 2% bandwidth overhead with negligible latency impact. There is no perceptible quality degradation.
All VoIP equipment supports SRTP is not the case. Many legacy IP-PBXs, analog telephone adapters, and older SIP phones support only plain RTP. An SBC with RTP-to-SRTP conversion capability bridges this gap without requiring equipment replacement.
SRTP provides end-to-end encryption is technically possible but uncommon. In most enterprise deployments, SRTP operates hop-by-hop. The SBC decrypts and re-encrypts media at the network boundary. This allows the SBC to enforce security policy, generate CDRs, and perform media operations. True end-to-end encryption (where only the two endpoints hold keys) is rare in production VoIP because it prevents all intermediate processing.

Getting Started with SRTP

Deploying SRTP across your VoIP network is a four-step process:

  1. Audit your current encryption statusIdentify which trunks, endpoints, and platforms currently use RTP versus SRTP. Note which connections cross the public internet and which stay within your controlled network. Prioritize internet-facing and compliance-sensitive connections.
  2. Enable TLS on SIP signalingSRTP key exchange via SDES depends on TLS to protect the master key in transit. Configure TLS on your SBC for all SIP connections, starting with internet-facing trunks. Use port 5061 (the standard SIP-over-TLS port).
  3. Configure SRTP on your SBCEnable SRTP for each trunk group or Network Access Point. For mixed environments, configure the SBC to perform RTP-to-SRTP conversion so that legacy endpoints can connect to SRTP-mandatory platforms without modification.
  4. Verify with packet captureUse Wireshark or your SBC’s built-in capture tools to confirm that media packets on SRTP-enabled trunks are encrypted. Verify that RTP-to-SRTP conversion is working correctly on mixed legs. Check that SRTCP is also encrypted alongside SRTP.

SRTP in the Broader VoIP Security Stack

SRTP is one layer in a defense-in-depth approach to VoIP security. A hardened voice network combines:

  • SRTP for media encryption
  • TLS for signaling encryption
  • SBC with B2BUA architecture for topology hiding, access control, and encryption enforcement
  • DoS/DDoS protection at the SBC to mitigate SIP flood attacks
  • Dynamic blacklisting and call access control lists to block malicious traffic
  • STIR/SHAKEN for caller identity authentication

Each layer addresses a different threat vector. SRTP specifically addresses the confidentiality and integrity of voice and video media in transit. Combined with the other layers, it forms part of a comprehensive VoIP security posture.

Frequently Asked Questions

What is the difference between SRTP and RTP?

RTP transmits voice and video in the clear without encryption or authentication. SRTP adds AES-128 Counter Mode encryption, HMAC-SHA1 message authentication, and replay protection using a sliding window of sequence numbers. Both use UDP transport, but SRTP provides complete confidentiality and integrity protection for media streams with approximately 2% bandwidth overhead.

Does SRTP replace TLS?

No. SRTP and TLS protect different data flows and both are required for complete VoIP security. SRTP encrypts media (the actual voice and video packets on UDP). TLS encrypts signaling (the SIP messages that set up and manage calls on port 5061). Without TLS, the SDES key exchange in SIP signaling exposes the SRTP master key in cleartext, defeating the purpose of media encryption.

Is SRTP required for Microsoft Teams?

Yes. Microsoft Teams Direct Routing requires SRTP for all media. If your SBC cannot negotiate SRTP, Teams calls will fail. The SBC must also support RTP-to-SRTP conversion to bridge legacy voice infrastructure (which may only support plain RTP) with Teams’ SRTP-mandatory requirements.

Does SRTP add noticeable latency to calls?

No. SRTP uses AES Counter Mode, a stream cipher that processes each packet independently. This adds approximately 2% bandwidth overhead (roughly 1.6 kbps per direction on a G.711 call with 20ms packetization) with negligible latency impact. There is no perceptible quality degradation.

Can legacy equipment that only supports RTP connect to SRTP-mandatory platforms?

Yes, through an SBC with RTP-to-SRTP conversion. The SBC accepts plain RTP from the legacy side, encrypts it into SRTP, and forwards it to the destination. The legacy equipment does not need to change. This is how organizations connect existing IP-PBXs and SIP phones to Teams, WebRTC platforms, and other SRTP-required services.

Conclusion

SRTP extends RTP with AES encryption, HMAC-SHA1 authentication, and replay protection. It adds approximately 2% bandwidth overhead with negligible latency impact. TLS and SRTP together provide complete VoIP security: TLS for signaling, SRTP for media.
The SBC is the enforcement point. It negotiates SRTP parameters, converts RTP to SRTP for legacy endpoints, and maintains separate encryption contexts on each call leg. For organizations connecting to Microsoft Teams, WebRTC platforms, or any SRTP-mandatory service, an SBC with RTP-to-SRTP conversion capability is the bridge between existing infrastructure and modern security requirements.

Secure Your VoIP Media with ProSBC

ProSBC includes SRTP relay and RTP-to-SRTP conversion as standard features across all deployment scenarios. The B2BUA architecture negotiates encryption independently on each call leg: SRTP toward Teams, WebRTC, or any SRTP-mandatory platform on one side, and whatever your carrier supports on the other. Legacy endpoints connect without modification.
SIP-over-TLS, DoS/DDoS protection, dynamic blacklisting, and topology hiding are included in every deployment. For Microsoft Teams Direct Routing environments, ProSBC handles the full encryption and SIP normalization stack that Teams requires.
ProSBC is available on Azure, AWS, VMware, KVM/Proxmox, and baremetal, deployable wherever your network edge is.