A Twilio-Style Streaming Bridge for Asterisk, FreeSWITCH & AI Systems
Building an AI voice agent is no longer hard.
Connecting that agent to real phone calls (SIP, PBX, PSTN) is.
Most AI systems operate over WebSockets and PCM audio, while production telephony relies on SIP, RTP, codecs, and PBX logic. This mismatch is where most voice-AI projects fail to reach production.
This article explains how NextGenSwitch acts as a telephony abstraction layer, allowing any AI voice agent to communicate with Asterisk, FreeSWITCH, or PSTN networks using a Twilio-style <Connect><Stream> interface and real-time audio streaming.
AI voice systems typically expect:
WebSocket → PCM audio → AI pipeline → PCM audio
Telephony systems operate with:
PSTN → SIP → PBX → RTP (μ-law / A-law)
Key problems:
NextGenSwitch sits between PBX/PSTN infrastructure and AI services.
It provides:
Your AI service never touches SIP or RTP directly.
NextGenSwitch works with standard SIP environments, including:
Caller
|
[PSTN / SIP Trunk]
|
[Asterisk / FreeSWITCH]
|
[NextGenSwitch]
|
<WebSocket Audio Stream>
|
[Any AI Voice Service]
When a call reaches NextGenSwitch, it fetches XML instructions (similar to TwiML).
<?xml version="1.0" encoding="UTF-8"?>
<Response>
<Connect>
<Stream url="wss://ai.yourdomain.com/ws/voice-agent"/>
</Connect>
</Response>
This single instruction:
Parameters are not mandatory.
They are metadata only—exactly like Twilio <Parameter>.
<Response>
<Connect>
<Stream url="wss://ai.yourdomain.com/ws/voice-agent">
<Parameter name="agent" value="support-bot"/>
<Parameter name="tenant_id" value="company-01"/>
<Parameter name="language" value="en-US"/>
<Parameter name="context" value="sales-inquiry"/>
</Stream>
</Connect>
</Response>
These parameters are delivered to your AI service in the JSON start event and can be used for routing, prompts, or CRM lookups.
NextGenSwitch uses a Twilio Media Streams–style JSON protocol for audio exchange.
Your AI service only needs to understand four event types:
startmediastopstart — Call Initialization{
"event": "start",
"streamId": "NGS_STREAM_123456",
"start": {
"callId": "NGS_CALL_abc",
"from": "+8801XXXXXXXXX",
"to": "5000",
"customParameters": {
"agent": "support-bot",
"tenant_id": "company-01",
"language": "en-US"
}
}
}
Important
streamIdmedia — Inbound Audio (Caller → AI){
"event": "media",
"streamId": "NGS_STREAM_123456",
"media": {
"payload": "BASE64_AUDIO_BYTES=="
}
}
Audio characteristics:
NextGenSwitch handles:
Your AI service receives clean, ordered audio frames.
media — Outbound Audio (AI → Caller){
"event": "media",
"streamId": "NGS_STREAM_123456",
"media": {
"payload": "BASE64_AUDIO_BYTES=="
}
}
Your AI service sends synthesized audio back using the same structure.
NextGenSwitch:
stop — Call End{
"event": "stop",
"streamId": "NGS_STREAM_123456",
"stop": {
"reason": "hangup"
}
}
NextGenSwitch does not mandate any AI framework.
You can use:
Frameworks like Pipecat can be used as a reference implementation, but they are optional, not required.
What matters is:
| Problem | Solution |
|---|---|
| SIP & RTP complexity | Handled by PBX + NextGenSwitch |
| Codec conversion | Automatic |
| Real-time streaming | WebSocket |
| AI vendor lock-in | None |
| Multi-tenant routing | XML + parameters |
| PSTN scalability | SIP-native |
<Stream url> is required