VoIP Troubleshooting Guide: Common Issues, First-Pass Diagnostics, and Where to Look Next

Magnifying glass examining a glowing SIP call trace diagram, representing VoIP troubleshooting and fault isolation on a voice network

When a customer reports that their VoIP is broken, the first ten minutes set the tone for the rest of the ticket. The right move is rarely to start changing settings. It is to capture enough evidence to isolate where the failure lives, then apply the smallest possible fix. This guide is the executive overview of VoIP troubleshooting: the symptoms operators see most often, the diagnostic step that isolates each one quickly, and the deeper companion guides to follow when a symptom needs a true deep dive.

The structure is deliberately symptom-first. The article maps each symptom to its most likely causes, the single piece of evidence that confirms or rules them out, and the fix paths that resolve the call.

Key Terms and Concepts

A quick-reference glossary for terms used throughout this article.

SIP (Session Initiation Protocol)The signaling protocol that sets up, modifies, and tears down voice calls. Almost every VoIP issue that prevents a call from connecting is a SIP problem.

RTP (Real-time Transport Protocol)Carries the actual voice media between endpoints once SIP has set the call up. One-way audio and call quality issues are RTP problems, not SIP problems.

SDP (Session Description Protocol)The payload inside SIP messages that advertises which IP, port, codec, and encryption parameters each side will use for media. A misconfigured SDP is the most common root cause of one-way audio.

B2BUA (Back-to-Back User Agent)An SBC architecture that fully terminates the SIP session on one leg and re-originates a new session on the other, giving the SBC complete control over headers and media on both sides.

JitterThe variation in the time delay (or latency) of data packets as they travel across a network.

NAP (Network Access Point)The TelcoBridges term for a configured SIP trunk or peer group. Routing, codec policy, encryption settings, and SIP header rules are configured per NAP.

CDR (Call Detail Record)The per-call log the SBC emits after the call ends, capturing duration, codecs, MOS score, hangup cause, and the signaling path. CDRs are the first place to look for patterns across calls.

PCAP (Packet Capture)A raw recording of every IP packet on a network interface, readable in Wireshark or sngrep. Nothing beats a PCAP for confirming what the SBC actually sent and received.

MOS (Mean Opinion Score)A 1-to-5 quality score derived from jitter, packet loss, and latency, with anything under 3.5 generally indicating a quality problem worth investigating.

DTMF (Dual-Tone Multi-Frequency)The in-band or signaled tone sequence callers generate when pressing keys on a phone. DTMF failures are usually mode mismatches between in-band, RFC 2833, and SIP INFO.

The 5-Minute First Pass

Before changing a single setting, capture the evidence. Reconfiguring before a deep understanding of the root cause can lead to more downtime.

The five pieces of information to gather first are: when the call was placed (timestamp to the second), the direction (inbound or outbound from the operator’s perspective), the originating and terminating numbers, whether transcoding was required, and the exact error the user saw or heard. If the user “just heard nothing,” that is a different problem from “got a fast busy tone,” which is different again from “the call connected and then dropped after twenty seconds.”

The next step is to isolate the failure along four axes. Was it a signaling failure (the call never connected) or a media failure (the call connected but audio failed)? Did the failure happen on the originating side of the SBC or the terminating side? Was it an on-net call (between two endpoints the operator controls) or off-net (to an external destination)? And was it a one-off, an intermittent pattern, or a systematic failure affecting every call?

The single most diagnostic question at this stage is almost always: did this work before? If yes, the next question is what changed. A certificate rotation, a routing rule update, an upstream carrier change, a firewall policy push, or a software upgrade is the cause of a disproportionate share of VoIP incidents. The change log usually finds the issue faster than the trace does.

Symptom 1: Calls Will Not Connect

The call fails before audio even has a chance. No ringing, no answer, no media.

The most likely causes are a SIP signaling failure (the INVITE never gets a 200 OK), a registration that has not happened or has expired (the endpoint cannot be reached because it is not registered), an ACL or fraud rule blocking the source IP or the called number, a routing rule that does not match the dialed pattern, or a TLS handshake failure on an encrypted trunk.

The first diagnostic step is to read the SIP response code on the failing INVITE. A 4xx response means the request had a client-side problem (auth, permission, format, codec). A 5xx response means the server was unable to process the request even though the request itself was valid (timeout, internal error, no resources). A 6xx response is a global decline that should be treated as final. For the full catalog, follow the companion SIP signaling fundamentals reference; the patterns that repeat most in real operations are below.

A 403 Forbidden almost always means access control. Either the credentials failed authentication, or a fraud rule has blacklisted the destination. A 404 Not Found usually means the dialed number did not match any routing rule. A 408 Request Timeout means the next hop did not respond, often because the peer is unreachable on the network or has stopped sending OPTIONS. A 488 Not Acceptable Here typically points to codec mismatch — the incoming SDP did not offer any codec the terminating side will accept. A 503 Service Unavailable means, specifically, that the next hop is not taking calls at this time. That often means the next hop is overloaded, although that is not necessarily the case and can be a dangerous assumption when investigating issues.

Symptom 2: Audio Connection Issues — No Audio and One-Way Audio

The call signaled successfully (200 OK was returned, ACK was sent), but audio is missing in one or both directions. Users typically describe it as “I can hear them but they can’t hear me” or “the call connects to dead air.”

The four likely root causes are a firewall blocking the RTP path, an SDP advertising the wrong IP address or port range, an asymmetric routing problem where RTP goes one way over one path and the other way over a different path, or an SRTP key mismatch where one side cannot decrypt the other’s media.

The first diagnostic step is to capture both signaling and media on the SBC interface. If RTP packets are flowing in one direction only, the issue is a firewall blocking on the side that is not flowing. If RTP is not flowing at all, the issue is in the SDP, usually a private IP address advertised where a public one was expected, or a media port outside the firewall’s allowed range. If RTP is flowing in both directions but the audio is silent or garbled, the issue is encryption or codec negotiation.

Symptom 3: Audio Quality Issues

The call connects, both sides hear audio, but the audio is bad. Choppy, robotic, gaps, echo, or simply lower than the speech-quality level the operator’s SLA promised.

The likely causes are packet loss (typically anything above 1% becomes audible), jitter beyond the receiver’s buffer size, end-to-end latency above 150 ms one-way, codec transcoding artifacts (especially when a low-bitrate codec is involved), or echo from a hybrid leg somewhere in the path.

The first diagnostic step is to check the SBC call trace tool. If MOS is below 3.5, the codec is G.711, and the call has more than a few seconds of audio, the network is the suspect, not the SBC. The breakdown of MOS into its constituent parts (jitter, packet loss, latency) points at the specific layer that is failing. The full thresholds, per-trunk monitoring methodology, and metrics architecture are covered in the companion VoIP monitoring best practices article, which should be the operational reference for anyone building a quality monitoring practice.

In real operations, three patterns produce most quality complaints. WAN congestion on a specific carrier route, usually visible as packet loss spiking on calls routed through that trunk while other trunks stay clean. QoS that was supposed to be applied but is not. And jitter buffers that are sized for a different network profile than the one in use, either too small (so jitter becomes audible drops) or too large (so latency becomes audible).

Symptom 4: Calls Drop Mid-Conversation

The call connected, audio was flowing, and then the call ended without either party hanging up. The user description is usually “we got cut off.”

The likely causes are session timer expiration without a refresh, an RTP keepalive failure causing one side to time out the media, a NAT binding that timed out and dropped the path, a peer-initiated BYE triggered by silence detection (some PBXs cut calls they think have ended), or a network blip on a TLS-encrypted trunk that broke the secure channel.

The most diagnostic question is exactly when the call ended. If calls consistently drop at almost exactly the same duration, the cause is a timer somewhere in the path. Thirty-two seconds usually points at a Session Refresh not being honoured. If the drop time is random, the cause is network-driven: a flapping link, a transient firewall reload, or an upstream BGP convergence event.

Session timer issues are the most common subset of mid-call drops and the easiest to fix from the SBC. The normalization rule is to insert a Session-Expires header with a value the destination will accept, or to strip the timer entirely on legs that handle it poorly.

Symptom 5: Registration Loops or Endpoints Offline

The endpoint is showing offline in the management dashboard.

The likely causes are an ACL blocking the REGISTER traffic (often because a fraud rule has mistaken legitimate re-registration for a scanning attack), credential mismatch after a password rotation, a NAT keepalive interval longer than the binding timeout, scanner or attack traffic confusing the detection logic, or a TLS certificate that has expired on either side.

The first diagnostic step is to pull SBC logs of the affected endpoint. The log will show the registration attempts and how the SBC responded. A 401 challenge followed by a successful 200 OK is the healthy pattern. A 401 followed by another 401 indicates the SBC rejected the registration outright. Identifying the root cause must be done at the registrar level. If the SBC is the registrar, those logs will be informative. If the SBC is not the registrar, the SBC logs will only tell you so much.

The companion SIP DoS attack prevention guide covers the boundary between legitimate registration patterns and scanning attacks in depth, and is the reference to use when tuning thresholds.

Symptom 6: DTMF Failures

The call connects with good audio, but the IVR cannot read the digits the caller is pressing.

The likely causes are a DTMF mode mismatch (one side is sending in-band audio tones, the other is expecting RFC 2833 telephone-events or SIP INFO messages), codec transcoding stripping the DTMF on its way through, or a payload type mismatch on the telephone-event RTP payload (one side using payload type 101, the other expecting 96).

The first diagnostic step is to capture media and verify that what is being sent matches what should be sent. If the user presses a digit and the trace shows an RFC 2833 telephone-event packet with the right number, the issue is downstream. If the trace shows in-band audio tones but the destination negotiated RFC 2833, that is the mismatch. A B2BUA SBC with a transcoding unit can normalize between DTMF modes per leg, which is usually the fastest fix for mixed environments.

A specific case worth flagging: low-bitrate codecs (anything compressing aggressively) can strip in-band DTMF entirely. If the path includes a transcode and the source is sending in-band tones, the call cannot reliably carry DTMF. The fix is to negotiate RFC 2833 telephone-events end-to-end, or to keep the path in G.711 if DTMF must remain in-band.

Symptom 7: Fax Fails

Fax is brutally sensitive to packet loss, jitter, and codec choice. The likely causes are aggressive codec transcoding breaking the T.30 protocol, packet loss above the fax tolerance threshold (usually around 0.5%), no T.38 negotiation when both sides support it, or a jitter buffer eating the V.21 handshake tones at the start of the fax session.

The first diagnostic step is to identify whether the path is T.38 relay or G.711 passthrough. T.38 demodulates the fax signal into a digital protocol over UDPTL, which tolerates loss and jitter well. G.711 passthrough carries the analog fax tones inside the voice codec, which is fragile under any loss. If the path tried to negotiate T.38 but fell back to G.711, the SDP renegotiation usually shows the reason. The companion Fax over IP and T.38 guide covers the full diagnostic path, the fax-safe codec profile, and the SBC settings that make fax work reliably.

Symptom 8: Calls Do Not Reach the Specified Destination

Most calls succeed, but a specific number, prefix, or carrier consistently rejects calls.

The likely causes are a carrier-side rejection (caller ID not whitelisted, attestation level too low, geographic block), a number-format mismatch (E.164 versus national versus local, or extra digits in the From header), a per-NAP mapping rule that is missing for the new destination, or a STIR/SHAKEN attestation issue where calls signed at lower than A-level are being blocked by the terminating carrier.

The most useful diagnostic is to identify the SIP trunk or NAP the failed call went through, then re-route through an alternate. If the call succeeds on the alternate trunk, the original trunk’s normalization, attestation, or routing logic is the issue. If it still fails on the alternate, the destination side is rejecting based on something in the call (caller ID, attestation, blocklist).

If the failures cluster around recent regulatory changes, look at attestation. Several large US carriers downgrade or block calls signed at lower than A-level, and providers who relied on an upstream wholesale partner for signing have seen their effective attestation drop. The STIR/SHAKEN A-level attestation guide covers the operational path from C-level to A-level and the FCC requirements for each.

Symptom 9: Teams Direct Routing-Specific Issues

Microsoft Teams Direct Routing has its own set of failure patterns because Microsoft’s SBC requirements are strict and non-negotiable. SIP OPTIONS heartbeat failures, TLS handshake errors, FQDN mismatches between the SBC certificate and what is registered in Teams Admin Center, and SRTP negotiation issues are the most common.

A first-pass check is the Teams Admin Center status for the SBC. If it shows offline, the OPTIONS heartbeat is failing. The cause is usually a TLS handshake issue (expired certificate, missing intermediate certificate, or the upcoming Microsoft CA root trust list change), or a network path that has dropped between the SBC and Microsoft. The Microsoft 2026 CA root certificate update is the most important near-term failure scenario to verify against. For the full Direct Routing requirements and SBC selection criteria, refer to the Teams Direct Routing learning page.

The Diagnostic Toolkit

Four tools cover the vast majority of VoIP troubleshooting work, and an operator who has all four ready before the first ticket comes in resolves issues several times faster than one who has to assemble them under pressure.

The SBC’s own call trace is the first stop for any signaling question, dumping a per-call SIP ladder showing every message exchanged on both legs. The SBC’s CDR output gives the post-call summary, including duration, codecs, MOS, and hangup cause. SNMP traps and a basic monitoring board cover the systematic patterns. And a PCAP capture on the SBC interface, opened in Wireshark, gives the raw truth when traces and CDRs disagree about what happened.

When to Escalate to Your Carrier

Three signals strongly suggest the issue is upstream rather than in the operator’s own infrastructure. Multiple originating trunks all see the same failure to the same destination range. Multiple destinations on a single trunk all fail at the same time. The SBC’s SIP trace shows the failure response originating from the carrier side (5xx, 6xx, or a long delay followed by 408).

The package to give the carrier should always include a SIP trace covering the failing call from the SBC’s perspective, a PCAP from the SBC interface for the relevant timeframe, and a small set of concrete call examples (caller, callee, exact timestamp, duration, error). Carriers move much faster when the request includes the specific evidence they need to find the call in their own logs. Vague tickets that say “calls to area code 305 are failing” without timestamps and number examples sit in the queue.

How ProSBC Helps Resolve VoIP Issues Faster

ProSBC is built around the operational reality that voice issues happen and have to be diagnosed quickly. Per-call MOS scoring is calculated natively at the SBC, with no external probe required. ProSBC’s CDR output includes the signaling path, hangup cause, codec on each leg, and quality fields, in both text and RADIUS formats. Live Wireshark-compatible packet capture can be enabled per call or per NAP without restarting anything. SIP call trace is available through the web management interface.

The programmable Ruby routing engine is the leverage point for systemic issue patterns. When a specific carrier consistently sends malformed headers, transcoding artifacts, or unusual response codes, a routing script can normalize the pattern at the SBC layer rather than requiring every PBX downstream to handle it. The same engine supports filter-chain integration with TransNexus ClearIP, Neustar, SecureLogix, and YouMail for the fraud, STIR/SHAKEN, and reputation-scoring decisions that affect call completion.

For operators who find themselves running a troubleshooting practice rather than running their business, the TelcoBridges Managed Service takes the diagnostic work, configuration changes, and ongoing monitoring off the operator’s plate while leaving full visibility and control with the customer.

Resolve VoIP Issues Faster with ProSBC

ProSBC is a carrier-grade, software-based Session Border Controller designed with the diagnostic surface that real operations need. Per-call MOS, live packet capture, full CDR output, web-based SIP trace, and programmable normalization through the Ruby routing engine cover the troubleshooting workflow from first symptom to root cause.

For service providers running monitoring as a practice, MaaS (Monitoring as a Service) is available as a standalone product that brings the metrics into a managed dashboard. For operators who want the diagnostic burden lifted entirely, the Managed Service tier includes ProSBC+ with 1+1 high availability, 24×7 Level 3 support, ongoing configuration changes, and continuous monitoring.

ProSBC is available on AWS, Microsoft Azure, VMware, KVM/Proxmox, and bare metal. A 30-day free trial with 500 concurrent sessions provides a full diagnostic environment for evaluation, and the permanent 3-session ProSBC Lab license is available immediately for testing and validation work.

Prefer to evaluate on your own first? Start your 30-day free trial.