– Kamailio SIP Server –

New Dialog Module Design (proposal)

This section provides a new design for Kamailio's Dialog module in order to make it more powerful and suitable. The main target is proper handling of forking dialogs and special rare cases, such as when spirals occur.

Introduction

Current Dialog module lacks some important features, such as when parallel or serial forking occurs. For an initial INVITE just a single dialog entry is created by the Dialog module which tries to store dialog data as CSeq, remote target, To-tag and so on. Unfortunately, this is not enough when forking occurs as different dialogs (or early-dialogs) are generated, each of them possibly holding a different CSeq value (i.e., if a forking branch requires 100rel) and having different To-tags and remote-targets (along with other dialog-specific information). Correct tracking of these values is not only needed in order to display statistics properly but also to make MI-triggered session termination work.

At the same point, from the point of view of the UAC it shouldn't matter how many forks the proxy (or a forking proxy behind the proxy, i.e., a remote proxy) has generated since it should be counted as just one dialog for such client. For example, imagine a proxy that wants to limit the number of concurrent calls made by a specific user. In that case, it shouldn't matter how many branches the proxy (either local or remote) generates.

The new module design proposed here makes use of two different tables:

  • dialog_in: Stores general and caller data and dialog state.
  • dialog_out: Stores callee data for each generated (early) dialog. Multiple dialog_out entries (when there is local forking in the proxy or forking in a proxy downstream) would reference a single dialog_in entry (see also below the special case of “concurrently confirmed calls”).

dialog_in table

When a dialog is started upon receipt of an INVITE in the proxy, a single entry in dialog_in table is created with state = “proceeding” (in the dlg_onreq() function).

However, there is a constrain that prevents the creation of a new entry in dialog_in table: If the INVITE's From-tag and Call-ID match an existing entry in the dialog_in table then no new entry will be added. Basically this means that a SIP spiral will be observed as a single dialog which is reasonable since spiraled INVITEs, while triggering the creation of new transactions on the same proxy, still belong to the same dialog (i.e., the same SIP dialog endpoints).

In order to match responses to requests, the Dialog module registers on the TM module callback event DLGCB_RESPONSE_OUT_N during dialog creation. Along with the registration, a reference to the dialog structure is be passed so that during execution of the response callback, the associated dialog can be easily modified without further matching effort.

Definition of the dialog_in table:

Column Description Possible values Notes
id Table primary key as usual
hash_entry Dialog hash entry information
hash_id Dialog hash id information
dialog_id Dialog ID Required for the case of concurrently confirmed calls
callid Value of Call-ID header
from_tag Value of tag param in From header
from_uri Value of the URI in From header Optional
caller_original_cseq Original value of CSeq in the INVITE
ruri Value of ruri just when dlg_manage is invoked Optional
caller_contact Value of URI in INVITE's Contact header
caller_route_set Route records from caller side (proxy to caller) Optional
caller_sock Socket in which the INVITE was received
state Current status of the dialog from the caller point of view 1,2,3,4 Codes explained below
start_time Start time of the dialog
timeout Timeout value set for this dialog
toroute Index of the route to be executed at timeout

dialog_out table

The dialog_out table contains one entry for every early dialog generated for a given dialog_in entry (i.e., SIP dialog). The entries live there as long as the dialog is in the early state. As soon as it transitions to confirmed or terminated, all but one dialog_out entry turns obsolete and will be destroyed (after some seconds due to TM timer).

The dialog_out entries related to a specific dialog_in entry are changed (i.e., added, deleted, and modified) when a response is forwarded by a proxy towards the caller. (Note that this is unlike the current implementation w.r.t. the point in the time when callee data is stored: While this currently only happens when a dialog is confirmed, the new implementation will start filling as soon as possible, i.e., when a early-transitting response contains callee data already.) The forwarding occasions (namely, when 100+ and [23456]XX responses are received) will suffice because these perfectly match the times when dialog state needs to be adjusted.

Definition of the dialog_out table:

Column Description Possible values Notes
id Table primary key as usual
dialog_id Dialog ID This values must reference an entry in dialog_in table
to_tag Value of tag param in To header
caller_cseq Value of CSeq in the caller for this (early-)dialog This value could be different for each early dialog
callee_cseq Value of CSeq in the callee for this (early-)dialog
callee_contact Value of URI in Contact header of the response (if present)
callee_route_set Route records from callee side (proxy to callee) Optional
callee_sock Socket from which the INVITE was relayed
dflags Flags for this (early-)dialog Useful for modules on top of Dialog module

Dialog state values

The possible dialog state values are:

  • 1 (proceeding): There is no provisional response yet.
  • 2 (early): A provisional response (1XX but 100) has been received.
  • 3 (confirmed): A final 2XX response has been received.
  • 4 (terminated): A final negative [3456]XX response or BYE request has been received.

Note that there is no state for “confirmed but ACK pending”. According to RFC 3261, a dialog is confirmed when the 200 is received, no need to wait for the ACK. However if the ACK is not received by the UAS within 32 seconds then the UAS would terminate the dialog by sending a BYE.

Response processing

When a response is forwarded by a proxy, the TM module invokes the Dialog module callback TMCB_RESPONSE_OUT_N for the previously generated dialog. For security reasons (i.e., forgery preventation), the callback ignores the response if its Call-ID or From-tag doesn't match the entry in the dialog_in table with same dialog_id. If it is a valid response, the following algorithm is conducted to track dialogs appropriately.

  1. First, inspect if there is an already dialog_out entry with same To-tag (for the same dialog):
    1. If not, create a new dialog_out entry and fill it according to the content of the response (set callee_contact and callee_route_set if the response contains “Contact” and “Record-Route” headers).
    2. If so, update the matching dialog_out entry in case the response includes new information (replace callee_contact and callee_route_set if the response contains “Contact” and “Record-Route” headers).
  2. Second, inspect the response status code:
    1. If it's a provisional status code except for 100 (i.e., 100 < x < 200) then set the dialog state to “early” in the dialog_in entry (if it was in “proceeding” state).
    2. If it's a final negative status code (300 ⇐ x < 700) set the dialog state to “terminated”. When the transaction expires the dialog_in and dialog_out entries for the current dialog are cleaned up (similar to what is outlined in draft-sparks-sipcore-invfix, the TM module de facto uses an “Accepted” state; the expiration value varies from 64*T1 though).
    3. If it's a final positive status code (i.e., 200 ⇐ x < 300) check the dialog_in state for the matching dialog:
      1. If it's “proceeding” or “early” set it to “confirmed”. Remove all the other entries in dialog_out for the same dialog after TM expires the transaction (not before in order to absorb late in-early-dialog requests).
      2. If it's “confirmed” then this is a “concurrently confirmed call” case. Create a new Dialog ID token “X” and assign it to the created or updated dialog_out entry. Then, duplicate the dialog_in entry and set its Dialog ID value to “X” (see below why this operation is required). When TM expires the transaction remove all the other entries in dialog_out for the same dialog except those associated to established dialogs.

In-dialog request processing

When an in-dialog request arrives it's matched based on the RR cookie inserted in the original Record-Route header (so it will be present in the Route header of all the requests within same dialog). Legacy dialog matching (From-tag, To-tag, and Call-ID) is not valid for a proxy in scenarios in which a spiral occurs, as an in-dialog request would be matched twice for the same dialog. In order to enhance legacy matching mode to manage spirals correctly, the route-set would need to be stored on creation of the dialog structure and later compared on each in-dialog spiral hop. However, comparison of route sets proves to be difficult, not only for computational but also for several (mostly UA-induced) reasons:

  • The (loose) route set order may change;
  • a single Record-Route header may be split into multiple headers;
  • case-sensitivity (for domain parts) may toggle;
  • and others.

Therefore, dialog matching based on RR cookies will be the preferred and, initially, only implemented method (of course, anyone is free and welcome to upgrade legacy matching mode).

An in-dialog BYE for a confirmed dialog sets the state to “terminated”. When the call state is “early” the UAC may send a BYE to terminate a specific early-dialog only (as opposed to sending a CANCEL which would terminate the whole INVITE transaction and all early-dialogs, that is, the entire session). In that latter case, the entire call's state is changed to “terminated” if only one early-dialog existed, i.e., no proxy forking occurred.

Dialog counter

From the point of view of the client, a single dialog exists even if it forkes into various early-dialogs. Then, the script function or MI command should take into account just the entries in dialog_in table in order to know the number of dialogs for a specific profile.

Spiraled requests

The current module design allows to use callbacks based on dialog events (register_dlgcb()). These callbacks must be invoked when the state of the dialog changes.

In order to allow users to perform custom actions when a request spirals (i.e., a SIP request is routed through a proxy multiple times), the new callback DLGCB_SPIRALED will be introduced. It will be triggered each time a proxy receives an INVITE request when the dialog has already been created (i.e., the INVITE is passing the proxy for the second, third, or any other subsequent time). Although DGLCB_SPIRALED does not reflect a change in dialog state, it is still useful and needed in certain occasions, e.g., when modules depend on it.

Multiple 200 responses

For a single request, multiple 200 responses may arrive at the same time, thereby resulting in the confirmation of multiple dialogs. Dialogs created this way will be denoted as concurrently confirmed calls followingly.

For each such established call, a separate dialog_in entry will be generated which allows module callback users to track them separately and according to their needs (instead of having to track them either together or not at all). However, because concurrently confirmed calls all point to the same dialog hash ID, and because that hash ID must be chosen during request processing, the dialog_in table uses a dialog ID to accommodate for multiple dialogs maintained under the same hash ID.

Once a concurrently confirmed call is created in the Dialog module, the new callback DLGCB_CREATED_CONCUR will be triggered. Users may register for these callbacks just like they may do for DLGCB_CREATED, i.e., without the need to pass an existing dialog structure to the register function.

Dialog flags

Each (early-)dialog in dialog_out table can contain specific flags (dflags field). Meaning of such flags depends on other modules making usage of Dialog module. For example MediaProxy module could use these flags in order to force MediaProxy server just for certains (early-)dialogs (those detected as natted) and still using a convenient function like “engage_mediaproxy()” (which currently fails in forking cases as MediaProxy is applied to all or none of the dialogs).

This would involve a new functions setdflag(), resetdflag() and isdflagset(). These functions can be invoked just in “onreply_route” (the reason is that during “route”, “branch_route” and “failure_route” there is no dialog info yet).

Proxy-initiated dialog termination

Through an MI function called dlg_end_dlg, Kamailio allows to terminate a dialog by means of sending out two locally generated BYE requests to the involved UAs. The future dialog module implemention should improve this functionality with regards to the following aspects:

  • It should be possible to terminate dialogs in the “early” state, i.e., sending out BYE/CANCEL requests in order to terminate all branches appropriately.
    • ibc: IMHO it would be easier just to cancel the transaction as when fr_inv_timer expires, this is, by sending a CANCEL to all the pending branches and a 408 to the UAC (perhaps in this case a 480 would be more appropriate).
  • Dialog termination should not be restricted to two parties only.
  • The CSeq numbers of proxy-initiated requests should be high enough such that requests sent out by UAs at the same time the proxy does will not shadow proxy requests. That is, proxy-initiated requests to terminate a dialog should have precedence over UA-initiated requests.
  • The point in session time when dialog structures are destroyed by the proxy should not depend on whether the proxy or UA terminates the call. As of now, dialog structures are destroyed for UA-initiated terminations as soon as a UA-initiated BYE request is forwarded by the proxy whereas for proxy-initiated terminations the reception of the final responses to the BYE requests from both parties is awaited. This conceptual mismatch should be removed in favor of the better of the two approaches. The advantage of not destroying dialog structures prior to the reception of the final response is that the proxy may make sure that the UA actually received and properly processed the BYE request. If that check is skipped and the dialog structure destroyed immediately instead, the UA could send a new in-dialog request on its own (say, Re-INVITE) which the dialog module would not be able to associate to an existing dialog but construct a new one. Consequently, the dialog would dangle until expiration and consume resources. To avoid these kind of race conditions, extra effort taken by the proxy to ensure proper session teardown by means of response parsing is advised.
  • It should be posible to terminate a dialog by providing its Call-ID and From-tag. Current implementaton of dlg_end_dlg requires passing dialog's hash_id and hash_entry which are module internal values that can be retrieved via MI dlg_list (this mechanism involves two MI calls in order to terminate a dialog).

Examples

Parallel forking

Alice calls Bob and the proxy forks in parallel to both locations of Bob (Bob-1 and Bob-2):

  • The INVITE (CSeq 1) arrives to the proxy and a dialog is generated:
dialog_in hash_id dialog_id callid from_tag caller_contact state
1111 1111a abcd ffff sip:alice@home.org proceeding
  • The proxy does parallel forking and calls Bob-1 and Bob-2. First bob-1 replies 180 (with no Contact). As the response is forwarded by the proxy, a new dialog_out entry is created and the dialog_in entry's state updated according.
dialog_in hash_id dialog_id callid from_tag caller_contact state
1111 1111a abcd ffff sip:alice@home.org early
dialog_out dialog_id to_tag caller_cseq callee_cseq callee_contact
1111a bbb111 1 (not set) (not set)
  • Bob-2 replies 180 with Contact and requires “100rel” (PRACK). The forwarding generates a new dialog_out entry:
dialog_in hash_id dialog_id callid from_tag caller_contact state
1111 1111a abcd ffff sip:alice@home.org early
dialog_out dialog_id to_tag caller_cseq callee_cseq callee_contact
1111a bbb111 1 (not set) (not set)
1111a bbb222 1 (not set) sip:bob2@2.2.2.2
  • Alice sends in-dialog PRACK for Bob-2 (CSeq + 1):
dialog_in hash_id dialog_id callid from_tag caller_contact state
1111 1111a abcd ffff sip:alice@home.org early
dialog_out dialog_id to_tag caller_cseq callee_cseq callee_contact
1111a bbb111 1 (not set) (not set)
1111a bbb222 2 (not set) sip:bob2@2.2.2.2
  • Bob-1 replies 480. Since there is still one branch remaining, the failure response will not be forwarded yet, thereby not changing neither the dialog_in nor dialog_out table.
  • Bob-2 replies 200. This concludes the dialog successfully, triggers forwarding, and causes the deletion of all but one dialog_out entry (after the INVITE transaction has been cleaned up by TM). Additionally, the state is updated in the dialog_in entry:
dialog_in hash_id dialog_id callid from_tag caller_contact state
1111 1111a abcd ffff sip:alice@home.org confirmed
dialog_out dialog_id to_tag caller_cseq callee_cseq callee_contact
1111a bbb222 2 (not set) sip:bob2@2.2.2.2
  • Bob-2 sends an in-dialog INFO (starting remote CSeq with value 101), not affecting the state:
dialog_in hash_id dialog_id callid from_tag caller_contact state
1111 1111a abcd ffff sip:alice@home.org confirmed
dialog_out dialog_id to_tag caller_cseq callee_cseq callee_contact
1111a bbb222 2 101 sip:bob2@2.2.2.2
  • Bob-2 sends BYE (CSeq + 1) and causes the dialog state to transition:
dialog_in hash_id dialog_id callid from_tag caller_contact state
1111 1111a abcd ffff sip:alice@home.org terminated
dialog_out dialog_id to_tag caller_cseq callee_cseq callee_contact
1111a bbb222 2 102 sip:bob2@2.2.2.2
  • After some TM-dependent timeout, the transaction will finally be deleted. At this point, the dialog table entries may be cleaned up too.

Spiral

A complex case in which the INVITE is routed through a proxy twice and later serially forked to an IVR server. No new dialog_in entry is created upon the second (spiraled) receipt of same INVITE. However, custom modules which may still need to interact at this point may do so by means of the DLGCB_SPIRALED callback.

In the example scenario, Alice tries to call Bob:

Alice ----> P1 ----> P2 ----> P1 ----> Bob

However, Bob rejects the call and P2 decides to reroute it to an IVR server:

Alice ----> P1 ----> P2 ----> IVR

Let's inspect the behavior of the Dialog module in P1:

  • The INVITE (CSeq 1) arrives to the proxy P1 and a dialog is generated:
dialog_in hash_id dialog_id callid from_tag caller_contact state
1111 1111a abcd ffff sip:alice@home.org proceeding
  • The proxy routes the request to P2 and P2 again to P1 so a new INVITE server transaction is created. However, this new INVITE wouldn't create a new dialog (even if requested in the config script) because there is already an entry in the dialog_in table with the same Call-ID and From-tag. That is, a spiral has occurred.
    • If users or modules need to be aware of spiraled messages, however, they may hook up to the DLGCB_SPIRALED callback. That is, when the DLGCB_CREATED callback is executed, registering an additional callback for the same dialog and dialog type DLGCB_SPIRALED will result in that registered callback being run each time the dialog is spiraled.
  • The INVITE is routed to Bob which replies 180 with Contact and Record-Route mirror (so route-set is already set in the UAC side). Such response is matched in the TM module by the last INVITE transaction; however, no Dialog callback will be executed because the Dialog module did not register one when it detected spiraling. It only did for the very first INVITE transaction, i.e., at the very beginning of the spiral.
  • The 180 response unwinds the spiral and now it's matched against the first INVITE transaction, which invokes the callback for the first dialog:
dialog_in hash_id dialog_id callid from_tag caller_contact state
1111 1111a abcd ffff sip:alice@home.org early
dialog_out dialog_id to_tag caller_cseq callee_cseq callee_contact
1111a aaaa 1 (not set) sip:bob@1.2.3.4
  • Alice sends in-dialog INFO (valid during an early-dialog if route-set and remote target is set). The Dialog module inspects the Route cookie and matches the dialog:
dialog_in hash_id dialog_id callid from_tag caller_contact state
1111 1111a abcd ffff sip:alice@home.org early
dialog_out dialog_id to_tag caller_cseq callee_cseq callee_contact
1111a aaaaa 2 (not set) sip:bob@1.2.3.4
  • The INFO does the spiral and again arrives at the proxy. This time it's not matched against an existing dialog entry (as the cookie of the top Route is different now due to the loose-routing mechanism).
  • Bob replies 480. Again the reply matches second INVITE transaction with no dialog involved.
  • The 480 reponse is now received by P2 which sends the ACK to Bob and generates a new serial branch to the IVR server. Note that the the Dialog module knows nothing about the 480 response.
  • The IVR server replies a 200 (with a new To-tag, of course). Such 200 is sent by P2 to P1 and matches the first INVITE transaction so the dialog callback generates a new dialog_out entry for it. Then the response processing algorithm is performed and the previously created entry removed (after a while). The tables become:
dialog_in hash_id dialog_id callid from_tag caller_contact state
1111 1111a abcd ffff sip:alice@home.org confirmed
dialog_out dialog_id to_tag caller_cseq callee_cseq callee_contact
1111a bbbb 1 (not set) sip:ivr@provider.com

IMPORTANT: The new dialog_out has caller_cseq with value “1” (the original value) as no in-dialog request has been sent by the UAC to this branch. This is achieved by copying field caller_original_cseq from dialog_in table to the new entry under dialog_out table.

  • Alice sends the ACK. As explained before the ACK is just ignored from the Dialog module's point of view.
  • After some time Alice sends a BYE for the dialog established with the IVR server (Alice's CSeq becomes “2”). It's matched against the dialog_in entry and and the state updated:
dialog_in hash_id dialog_id callid from_tag caller_contact state
1111 1111a abcd ffff sip:alice@home.org terminated
dialog_out dialog_id to_tag caller_cseq callee_cseq callee_contact
1111a bbbb 2 (not set) sip:ivr@provider.com
  • Again, after some timeout, both entries will be destroyed.

Concurrently confirmed calls

The following example scenario will try to illustrate the behavior for concurrently confirmed calls. It will be as simple as this:

Alice ----> proxy ----> Bob

Let's inspect the behavior of the Dialog module on the proxy:

Alice calls Bob and the proxy forks in parallel to both locations of Bob (Bob-1 and Bob-2):

  • The INVITE arrives at the proxy and a dialog is generated. This comprises a new dialog_in entry
dialog_in hash_id dialog_id callid from_tag caller_contact state
1111 1111a abcd ffff sip:alice@home.org proceeding
  • The proxy does parallel forking and calls Bob-1 and Bob-2. First, Bob-1 replies 180 (with no Contact). As the response is forwarded to Alice, a new dialog_out entry is created and the dialog_in entry state updated accordingly:
dialog_in hash_id dialog_id callid from_tag caller_contact state
1111 1111a abcd ffff sip:alice@home.org early
dialog_out dialog_id to_tag caller_cseq callee_cseq callee_contact
1111a gggg 1 (not set) (not set)
  • Next, Bob-2 replies 180 (no Contact) which is again forwarded to Alice, triggering the creation of a new dialog_out:
dialog_in hash_id dialog_id callid from_tag caller_contact state
1111 1111a abcd ffff sip:alice@home.org early
dialog_out dialog_id to_tag caller_cseq callee_cseq callee_contact
1111a gggg 1 (not set) (not set)
1111a hhhh 1 (not set) (not set)
  • Now Bob-1 replies 200. As in the previous examples, all other dialog_out entries would be destroyed after TM cleans the INVITE transaction. For now the tables look as follows:
dialog_in hash_id dialog_id callid from_tag caller_contact state
1111 1111a abcd ffff sip:alice@home.org confirmed
dialog_out dialog_id to_tag caller_cseq callee_cseq callee_contact
1111a gggg 1 (not set) sip:bob1@1.1.1.1
1111a hhhh 1 (not set) (not set)
  • Immediately afterwards, Bob-2 replies 200 as well. The delay is too short for the proxy to cancel Bob-2 yet so it matches the INVITE transaction (in “Accepted” state) and forwards the response it as well, thereby establishing another dialog. That is, the response matches the second entry in dialog_out so the dialog_id is retrieved (“1111a”). The Dialog module notices that there is already an established dialog with dialog_id = “1111a” so it creates a new dialog ID “1111b”, updates the matching dialog_out entry with it and also duplicates the entry under dialog_in setting its dialog ID to “1111b”:
dialog_in hash_id dialog_id callid from_tag caller_contact state
1111 1111a abcd ffff sip:alice@home.org confirmed
1111 1111b abcd ffff sip:alice@home.org confirmed
dialog_out dialog_id to_tag caller_cseq callee_cseq callee_contact
1111a gggg 1 (not set) sip:bob1@1.1.1.1
1111b hhhh 1 (not set) sip:bob2@2.2.2.2
  • Alice ACKs both calls (again, not affecting the dialog states at all).
  • Alice BYEs the call established from Bob-2 with To-tag “hhhh”. As the request is routed within the proxy, it extracts the hash ID (“1111”) from the Route header and finds that two dialog_ins match, i.e., the dialogs with the dialog_id values “1111a” and “1111b”.
  • Comparing the BYE request's To-tag with those from the dialog_out entries where the dialog_id is “1111a” and “1111b”, respectively, the proxy determines that the request is directed for the dialog_in entry with dialog_id “1111b”. The dialog state is updated accordingly:
dialog_in hash_id dialog_id callid from_tag caller_contact state
1111 1111a abcd ffff sip:alice@home.org confirmed
1111 1111b abcd ffff sip:alice@home.org terminated
dialog_out dialog_id to_tag caller_cseq callee_cseq callee_contact
1111a gggg 1 (not set) sip:bob1@1.1.1.1
1111b hhhh 1 (not set) sip:bob2@2.2.2.2

Once the transaction layer's post-mortem timeout triggers, the dialog module will cleanup the terminated dialog_in and dialog_out entries. Meanwhile, the other concurrently confirmed call remains:

dialog_in hash_id dialog_id callid from_tag caller_contact state
1111 1111a abcd ffff sip:alice@home.org confirmed
dialog_out dialog_id to_tag caller_cseq callee_cseq callee_contact
1111a gggg 1 (not set) sip:bob1@1.1.1.1

TODO and ideas

  • Like AVP's during dialog's lifetime.