SIP trunk redundancy: active-active vs active-passive configurations

Carrier-grade voice infrastructure is expected to be available around the clock. A single SIP trunk connecting your platform to a wholesale carrier is a single point of failure — one network event, one misconfigured session border controller, or one carrier maintenance window can take down your entire voice capability. Redundant SIP trunk architecture eliminates that risk, but the two main approaches — active-active and active-passive — behave very differently and suit different operational requirements.

Why redundancy matters in wholesale voice

For BPOs, contact centres, and businesses running high call volumes, voice downtime has immediate commercial consequences. For carriers and resellers who have committed to SLAs with their own customers, a single-trunk architecture makes those SLAs difficult to honour. Wholesale voice contracts — including Infinititel's — specify uptime commitments that depend on redundant interconnect architecture on both sides of the relationship.

Active-passive configuration

In an active-passive configuration, one trunk carries all live traffic while a second trunk sits idle, ready to take over if the primary fails. The failover is triggered by a detection mechanism — typically SIP OPTIONS messages sent at regular intervals, or by monitoring for registration failures or connection timeouts.

Active-passive is simpler to configure and easier to troubleshoot. All traffic flows through a single known path under normal conditions, which makes call quality monitoring and CDR reconciliation more straightforward. The tradeoff is that failover takes time — typically seconds to a minute depending on detection interval settings — which means there is a window during which calls may fail or drop.

Active-passive suits deployments where cost efficiency is prioritised and brief interruptions during failover are acceptable. It is not suitable for environments where even momentary call disruption is unacceptable, such as emergency services or high-frequency trading support lines.

Active-active configuration

In an active-active configuration, two or more trunks carry live traffic simultaneously, typically through load balancing at the session border controller level. Traffic is distributed across both paths according to a configured weighting or round-robin algorithm. If one trunk degrades or fails, traffic shifts immediately to the remaining trunk without a failover delay.

Active-active provides true continuous availability — there is no detection-and-switch latency, because traffic is already flowing on both paths. It also allows the full capacity of both trunks to be utilised under normal load, whereas active-passive leaves the secondary trunk's capacity unused except during failover events.

The complexity tradeoff is real. Active-active requires more sophisticated SBC configuration, careful attention to call routing logic to prevent loops, and more detailed CDR monitoring to ensure that billing and quality data from both paths is correctly reconciled. SIP registration management across two active trunks also requires careful configuration to avoid registration conflicts.

Geographic diversity

Neither configuration provides meaningful redundancy if both trunks terminate at the same physical point of presence. True resilience requires geographic diversity — primary and secondary trunks terminating at geographically separate PoPs, ideally connected via different network paths. A carrier maintenance event or datacenter outage that affects one PoP should not affect the other.

Infinititel maintains geographically diverse PoPs in Melbourne and Sydney for Australian interconnects, New York and Dallas for US traffic, and London for UK traffic. Partners configuring redundant trunks should use PoPs in different locations rather than two trunks terminating at the same facility.

Detection and failover timing

For active-passive configurations, the detection interval determines how quickly failover triggers. SIP OPTIONS heartbeats sent every 30 seconds with a two-failure threshold before failover give a worst-case detection time of around 60 seconds. Reducing the interval to 10 seconds with a single-failure threshold reduces that to 10–20 seconds, at the cost of more SIP signalling overhead and a higher risk of false-positive failovers on transient network events.

Most production environments use a 30-second OPTIONS interval with a two-failure threshold as a reasonable balance. High-availability environments where sub-minute failover is required should consider active-active instead.

Testing your redundancy

Redundant configuration that has never been tested under realistic conditions provides limited assurance. Periodic failover testing — deliberately taking the primary trunk offline during a low-traffic window and verifying that calls route correctly via the secondary — is the only reliable way to confirm that the configuration works as intended. Testing should include both new call origination and in-progress call handling during the failover event.

Note

This article is for informational purposes and does not constitute legal or regulatory advice. Requirements change over time. Consult a qualified telecommunications lawyer for advice specific to your situation.