The Parser Organization

The server implements what we call incremental parsing. It means that a header field will be not parsed unless it is really needed. There is a minimal set of header that will be parsed every time. The set includes:

The First Line Parser

Purpose of the parser is to parse the first line of a SIP message. The first line is represented by msg_start structure define in file parse_fline.h under parser subdirectory.

The main function of the first line parser is parse_first_line, the function will fill in msg_start structure.

Follow inline comments in the function if you want to add support for a new message type.

The Header Field Name Parser

The purpose of the header field type parser is to recognize type of a header field. The following types of header field will be recognized:

Via, To, From, CSeq, Call-ID, Contact, Max-Forwards, Route, Record-Route, Content-Type, Content-Length, Authorization, Expires, Proxy-Authorization, WWW-Authorization, supported, Require, Proxy-Require, Unsupported, Allow, Event.

All other header field types will be marked as HDR_OTHER.

Main function of header name parser is parse_hname2. The function can be found in file parse_hname.c. The function accepts pointers to begin and end of a header field and fills in hdf_field structure. name field will point to the header field name, body field will point to the header field body and type field will contain type of the header field if known and HDR_OTHER if unknown.

The parser is 32-bit, it means, that it processes 4 characters of header field name at time. 4 characters of a header field name are converted to an integer and the integer is then compared. This is much faster than comparing byte by byte. Because the server is compiled on at least 32-bit architectures, such comparsion will be compiled into one instruction instead of 4 instructions.

We did some performance measurement and 32-bit parsing is about 3 times faster for a typical SIP message than corresponding automaton comparing byte by byte. Performance may vary depending on the message size, parsed header fields and header fields type. Test showed that it was always as fast as corresponding 1-byte comparing automaton.

Since comparison must be case insensitive in case of header field names, it is necessary to convert it to lower case first and then compare. Since converting byte by byte would slow down the parser a lot, we have implemented a hash table, that can again convert 4 bytes at once. Since set of keys that need to be converted to lowercase is known (the set consists of all possible 4-byte parts of all recognized header field names) we can precalculate size of the hash table to be synonym-less. That will simplify (and speed up) the lookup a lot. The hash table must be initialized upon the server startup (function init_hfname_parser).

The header name parser consists of several files, all of them are under parser subdirectory. Main file is parse_hname2.c - this files contains the parser itself and functions used to initialize and lookup the hash table. File keys.h contains automatically generated set of macros. Each macro is a group of 4 bytes converted to integer. The macros are used for comparison and the hash table initialization. For example, for Max-Forwards header field name, the following macros are defined in the file:

#define _max__ 0x2d78616d   /* "max-" */
#define _maX__ 0x2d58616d   /* "maX-" */
#define _mAx__ 0x2d78416d   /* "mAx-" */
#define _mAX__ 0x2d58416d   /* "mAX-" */
#define _Max__ 0x2d78614d   /* "Max-" */
#define _MaX__ 0x2d58614d   /* "MaX-" */
#define _MAx__ 0x2d78414d   /* "MAx-" */
#define _MAX__ 0x2d58414d   /* "MAX-" */

#define _forw_ 0x77726f66   /* "forw" */
#define _forW_ 0x57726f66   /* "forW" */
#define _foRw_ 0x77526f66   /* "foRw" */
#define _foRW_ 0x57526f66   /* "foRW" */
#define _fOrw_ 0x77724f66   /* "fOrw" */
#define _fOrW_ 0x57724f66   /* "fOrW" */
#define _fORw_ 0x77524f66   /* "fORw" */
#define _fORW_ 0x57524f66   /* "fORW" */
#define _Forw_ 0x77726f46   /* "Forw" */
#define _ForW_ 0x57726f46   /* "ForW" */
#define _FoRw_ 0x77526f46   /* "FoRw" */
#define _FoRW_ 0x57526f46   /* "FoRW" */
#define _FOrw_ 0x77724f46   /* "FOrw" */
#define _FOrW_ 0x57724f46   /* "FOrW" */
#define _FORw_ 0x77524f46   /* "FORw" */
#define _FORW_ 0x57524f46   /* "FORW" */

#define _ards_ 0x73647261   /* "ards" */
#define _ardS_ 0x53647261   /* "ardS" */
#define _arDs_ 0x73447261   /* "arDs" */
#define _arDS_ 0x53447261   /* "arDS" */
#define _aRds_ 0x73645261   /* "aRds" */
#define _aRdS_ 0x53645261   /* "aRdS" */
#define _aRDs_ 0x73445261   /* "aRDs" */
#define _aRDS_ 0x53445261   /* "aRDS" */
#define _Ards_ 0x73647241   /* "Ards" */
#define _ArdS_ 0x53647241   /* "ArdS" */
#define _ArDs_ 0x73447241   /* "ArDs" */
#define _ArDS_ 0x53447241   /* "ArDS" */
#define _ARds_ 0x73645241   /* "ARds" */
#define _ARdS_ 0x53645241   /* "ARdS" */
#define _ARDs_ 0x73445241   /* "ARDs" */
#define _ARDS_ 0x53445241   /* "ARDS" */

As you can see, Max-Forwards name was divided into three 4-byte chunks: Max-, Forw, ards. The file contains macros for every possible lower and upper case character combination of the chunks. Because the name (and therefore chunks) can contain colon (":"), minus or space and these characters are not allowed in macro name, they must be substituted. Colon is substituted by "1", minus is substituted by underscore ("_") and space is substituted by "2".

When initializing the hash table, all these macros will be used as keys to the hash table. One of each upper and lower case combinations will be used as value. Which one ?

There is a convention that each word of a header field name starts with a upper case character. For example, most of user agents will send "Max-Forwards", messages containing some other combination of upper and lower case characters (for example: "max-forwards", "MAX-FORWARDS", "mAX-fORWARDS") are very rare (but it is possible).

Considering the previous paragraph, we optimized the parser for the most common case. When all header fields have upper and lower case characters according to the convention, there is no need to do hash table lookups, which is another speed up.

For example suppose we are trying to figure out if the header field name is Max-Forwards and the header field name is formed according to the convention (i.e. "Max-Forwards"):

As you can see, there is no need to do hash table lookups if the header field was formed according to the convention and the comparison was very fast (only 3 comparisons needed !).

Now lets consider another example, the header field was not formed according to the convention, for example "MAX-forwards":

  • Get the first 4 bytes of the header field name ("MAX-"), convert it to an integer and compare to "_Max__" macro.

    Comparison failed, try to lookup "MAX-" converted to integer in the hash table. It was found, result is "Max-" converted to integer.

    Try to compare the result from the hash table to "_Max__" macro. Comparison succeeded, continue with the next step.

  • Compare next 4 bytes of the header field name ("forw"), convert it to an integer and compare to "_Max__" macro.

    Comparison failed, try to lookup "forw" converted to integer in the hash table. It was found, result is "Forw" converted to integer.

    Try to compare the result from the hash table to "Forw" macro. Comparison succeeded, continue with the next step.

  • Compare next 4 bytes of the header field name ("ards"), convert it to integer and compare to "ards" macro. Comparison succeeded, continue with the next step.

  • If the following characters are spaces and tabs followed by a colon (or colon directly without spaces and tabs), we found Max-Forwards header field name and can set type field to HDR_MAXFORWARDS. Otherwise (other characters than colon, spaces and tabs) it is some other header field and set type field to HDR_OTHER.

In this example, we had to do 2 hash table lookups and 2 more comparisons. Even this variant is still very fast, because the hash table lookup is synonym-less, lookups are very fast.

The Header Field Body Parsers

To HF Body Parser

Purpose of this parser is to parse body of To header field. The parser can be found in file parse_to.c under parser subdirectory.

Main function is parse_to but there is no need to call the function explicitly. Every time the parser finds a To header field, this function will be called automatically. Result of the parser is to_body structure. Pointer to the structure will be stored in parsed field of hdr_field structure. Since the pointer is void*, there is a convenience macro get_to in file parse_to.h that will do the necessary type-casting and will return pointer to to_body structure.

The parser itself is a finite state machine that will parse To body according to the grammar defined in RFC3261 and store result in to_body structure.

The parser gets called automatically from function get_hdr_field in file msg_parser.c. The function first creates and initializes an instance of to_body structure, then calls parse_to function with the structure as a parameter and if everything went OK, puts the pointer to the structure in parsed field of hdr_field structure representing the parsed To header field.

The newly created structure will be freed when the message is being destroyed, see function clean_hdr_field in file hf.c for more details.

Structure to_body

The structure represents parsed To body. The structure is declared in parse_to.h file.

Structure Declaration:

struct to_param{
    int type;              /* Type of parameter */
    str name;              /* Name of parameter */
    str value;             /* Parameter value */
    struct to_param* next; /* Next parameter in the list */
};


struct to_body{
    int error;                    /* Error code */
    str body;                     /* The whole header field body */
    str uri;                      /* URI */
    str tag_value;                /* Value of tag */
    struct to_param *param_lst;   /* Linked list of parameters */
    struct to_param *last_param;  /* Last parameter in the list */
};

Structure to_param is a temporary structure representing a To URI parameter. Right now only TAG parameter will be marked in type field. All other parameters will have the same type.

Field Description:

CSeq HF Body Parser

Purpose of this parser is to parse body of CSeq header field. The parser can be found in file parse_cseq.c under parser subdirectory.

Main function is parse_cseq but there is no need to call the function explicitly. Every time the parser finds a CSeq header field, this function will be called automatically. Result of the parser is cseq_body structure. Pointer to the structure will be stored in parsed field of hdr_field structure. Since the pointer is void*, there is a convenience macro get_cseq in file parse_cseq.h that will do the necessary type-casting and will return pointer to cseq_body structure.

The parser will parse CSeq body according to the grammar defined in RFC3261 and store result in cseq_body structure.

The parser gets called automatically from function get_hdr_field in file msg_parser.c. The function first creates and initializes an instance of cseq_body structure, then calls parse_cseq function with the structure as a parameter and if everything went OK, puts the pointer to the structure in parsed field of hdr_field structure representing the parsed CSeq header field.

The newly created structure will be freed when the message is being destroyed, see function clean_hdr_field in file hf.c for more details.

Event HF Body Parser

Purpose of this parser is to parse body of an Event Header field. The parser can be found in file parse_event.c under parser subdirectory.

Note

This is NOT fully featured Event body parser ! The parser was written for Presence Agent module only and thus can recognize Presence package only. No subpackages will be recognized. All other packages will be marked as "OTHER".

The parser should be replace by a more generic parser if subpackages or parameters should be parsed too.

Main function is parse_event in file parse_event.c. The function will create an instance of event_t structure and call the parser. If everything went OK, pointer to the newly created structure will be stored in parsed field of hdr_field structure representing the parsed header field.

As usually, the newly created structure will be freed when the whole message is being destroyed. See function clean_hdr_field in file hf.c.

The parser will be not called automatically when the main parser finds an Event header field. It is up to you to call the parser when you really need the body of the header field to be parsed (call parse_event function).

Contact HF Body Parser

The parser is located under parser/contact subdirectory. The parser is not called automaticaly when the main parser finds a Contact header field. It is your responsibility to call the parser if you want a Contact header field body to be parsed.

Main function is parse_contact in file parse_contact.c. The function accepts one parameter which is structure hdr_field representing the header field to be parsed. A single Contact header field may contain multiple contacts, the parser will parse all of them and will create linked list of all such contacts.

The function creates and initializes an instance of contact_body structure. Then function contact_parser will be called. If everything went OK, pointer to the newly created structure will be stored in parsed field of the hdr_field structure representing the parsed header field.

Function contact_parser will then check if the contact is star, if not it will call parse_contacts function that will parse all contacts of the header field.

Function parse_contacts can be found in file contact.c. It extracts URI and parses all contact parameters.

The Contact parameter parser can be found in file cparam.c.

The following structures will be created during parsing:

Note

Mind that none of string in the following structures is zero terminated ! Be very carefull when processing the strings with functions that require zero termination (printf for example) !

typedef struct contact_body {
    unsigned char star;    /* Star contact */
    contact_t* contacts;   /* List of contacts */
} contact_body_t;

This is the main structure. Pointer to instance of this structure will be stored in parsed field of structure representing the header field to be parsed. The structure contains two field:

typedef struct contact {
    str uri;              /* contact uri */
    cparam_t* q;          /* q parameter hook */
    cparam_t* expires;    /* expires parameter hook */
    cparam_t* method;     /* method parameter hook */
    cparam_t* params;     /* List of all parameters */
    struct contact* next; /* Next contact in the list */
} contact_t;

This structure represents one Contact (Mind that there might be several contacts in one Contact header field delimited by a comma). Its fields have the following meaning:

  • uri - This field contains pointer to begin of URI and its length.

  • q - This is a hook to structure representing q parameter. If there is no such parameter, the hook contains 0.

  • expires - This is a hook to structure representing expires parameter. If there is no such parameter, the hook contains 0.

  • method - This is a hook to structure representing method parameter. If there is no such parameter, the hook contains 0.

  • params - Linked list of all parameters.

  • next - Pointer to the next contact that was in the same header field.

typedef enum cptype {
    CP_OTHER = 0,  /* Unknown parameter */
    CP_Q,          /* Q parameter */
    CP_EXPIRES,    /* Expires parameter */
    CP_METHOD      /* Method parameter */
} cptype_t;

This is an enum of recognized types of contact parameters. Q parameter will have type set to CP_Q, Expires parameter will have type set to CP_EXPIRES and Method parameter will have type set to CP_METHOD. All other parameters will have type set to CP_OTHER.

/*
 * Structure representing a contact
 */
typedef struct cparam {
    cptype_t type;       /* Type of the parameter */
    str name;            /* Parameter name */
    str body;            /* Parameter body */
    struct cparam* next; /* Next parameter in the list */
} cparam_t;

This structure represents a contact parameter. Field description follows:

  • type - Type of the parameter, see cptype enum for more details.

  • name - Name of the parameter (i.e. the part before "=").

  • body - Body of the parameter (i.e. the part after "=").

  • next - Next parameter in the linked list.

Digest Body Parser

Purpose of this parser is to parse digest response. The parser can be found under parser/digest subdirectory. There might be several header fields containing digest response, for example Proxy-Authorization or WWW-Authorization. The parser can be used for all of them.

The parser is not called automaticaly when by the main parser. It is your responsibility to call the parser when you want a digest response to be parsed.

Main function is parse_credentials defined in digest.c. The function accepts one parameter which is header field to be parsed. As result the function will create an instance of auth_body_t structure which will represent the parsed digest credentials. Pointer to the structure will be put in parsed field of the hdr_field structure representing the parsed header field. It will be freed when the whole message is being destroyed.

The digest parser contains 32-bit digest parameter parser. The parser was in detail described in section Header Field Name Parser. See that section for more details about the digest parameter parser algorithm, they work in the same way.

Description of digest related stuctures follows:

			
typedef struct auth_body {
    /* This is pointer to header field containing
     * parsed authorized digest credentials. This
     * pointer is set in sip_msg->{authorization,proxy_auth}
     * hooks.
     *
     * This is necessary for functions called after
     * {www,proxy}_authorize, these functions need to know
     * which credentials are authorized and they will simply
     * look into 
     * sip_msg->{authorization,proxy_auth}->parsed->authorized
     */
    struct hdr_field* authorized;
    dig_cred_t digest;           /* Parsed digest credentials */
    unsigned char stale;         /* Flag is set if nonce is stale */
    int nonce_retries;           /* How many times the nonce was used */
} auth_body_t;

This is the "main" stucture. Pointer to the structure will be stored in parsed field of hdr_field structure. Detailed description of its fields follows:

  • authorized - This is a hook to header field containing authorized credentials.

    A SIP message may contain several credentials. They are distinguished using realm parameter. When the server is trying to authorize the message, it must first find credentials with corresponding realm and than authorize the credentials. To authorize credentials server calculates response string and if the string matches to response string contained in the credentials, credentials are authorized (in fact it means that the user specified in the credentials knows password, nothing more, nothing less).

    It would be good idea to remember which credentials contained in the message are authorized, there might be other functions interested in knowing which credentials are authorized.

    That is what is this field for. A function that sucessfully authorized credentials (currenty there is only one such function in the server, it is function authorize in auth module) will put pointer to header field containing the authorized credentials in this field. Because there might be several header field containing credentials, the pointer will be put in authorized field in the first header field in the message containg credentials. That means that it will be either header field whose pointer is in www_auth or proxy_auth field of sip_msg structure representing the message.

    When a function wants to find authorized credentials, it will simply look in msg->www_auth->parsed->authorized or msg->proxy_auth->parsed->authorized, where msg is variable containing pointer to sip_msg structure.

    To simplify the task of saving and retrieving pointer to authorized credentials, there are two convenience functions defined in digest.c file. They will be described later.

  • digest - Structure containing parsed digest credentials. The structure will be described in detail later.

  • stale - This field will be set to 1 if the server received a stale nonce. Next time when the server will be sending another challenge, it will use "stale=true" parameter. "stale=true" indicates to the client that username and password used to calculate response were correct, but nonce was stale. The client should recalculate response with the same username and password (without disturbing user) and new nonce. For more details see RFC2617.

  • nonce_retries - This fields indicates number of authorization attempts with same nonce.

/*
 * Errors returned by check_dig_cred
 */
typedef enum dig_err {
    E_DIG_OK = 0,        /* Everything is OK */
    E_DIG_USERNAME  = 1, /* Username missing */
    E_DIG_REALM = 2,     /* Realm missing */
    E_DIG_NONCE = 4,     /* Nonce value missing */
    E_DIG_URI = 8,       /* URI missing */
    E_DIG_RESPONSE = 16, /* Response missing */
    E_DIG_CNONCE = 32,   /* CNONCE missing */
    E_DIG_NC = 64,       /* Nonce-count missing */
} dig_err_t;			

This is enum of all possible errors returned by check_dig_cred function.

  • E_DIG_OK - No error found.

  • E_DIG_USERNAME - Username parameter missing in digest response.

  • E_DIG_REALM - Realm parameter missing in digest response.

  • E_DIG_NONCE - Nonce parameter missing in digest response.

  • E_DIG_URI - Uri parameter missing in digest response.

  • E_DIG_RESPONSE - Response parameter missing in digest response.

  • E_DIG_CNONCE - Cnonce parameter missing in digest response.

  • E_DIG_NC - Nc parameter missing in digest response.

/* Type of algorithm used */
typedef enum alg {
    ALG_UNSPEC = 0,   /* Algorithm parameter not specified */
    ALG_MD5 = 1,      /* MD5 - default value*/
    ALG_MD5SESS = 2,  /* MD5-Session */
    ALG_OTHER = 4     /* Unknown */
} alg_t;

This is enum of recognized algorithm types. (See description of algorithm structure for more details).

  • ALG_UNSPEC - Algorithm was not specified in digest response.

  • ALG_MD5 - "algorithm=MD5" was found in digest response.

  • ALG_MD5SESS - "algorithm=MD5-Session" was found in digest response.

  • ALG_OTHER - Unknown algorithm parameter value was found in digest response.

/* Quality Of Protection used */
typedef enum qop_type { 
    QOP_UNSPEC = 0,   /* QOP parameter not present in response */
    QOP_AUTH = 1,     /* Authentication only */
    QOP_AUTHINT = 2,  /* Authentication with integrity checks */
    QOP_OTHER = 4     /* Unknown */
} qop_type_t;

This enum lists all recognized qop parameter values.

  • QOP_UNSPEC - qop parameter was not found in digest response.

  • QOP_AUTH - "qop=auth" was found in digest response.

  • QOP_AUTHINT - "qop=auth-int" was found in digest response.

  • QOP_OTHER - Unknow qop parameter value was found in digest response.

/* Algorithm structure */
struct algorithm {
    str alg_str;       /* The original string representation */
    alg_t alg_parsed;  /* Parsed value */
};

The structure represents "algorithm" parameter of digest response. Description of fields follows:

  • alg_str - Algorithm parameter value as string.

  • alg_parsed - Parsed algorithm parameter value.

/* QOP structure */
struct qp {
    str qop_str;           /* The original string representation */
    qop_type_t qop_parsed; /* Parsed value */
};

This structure represents "qop" parameter of digest response. Description of fields follows:

  • qop_str - Qop parameter value as string.

  • qop_parsed - Parsed "qop" parameter value.

/*
 * Parsed digest credentials
 */
typedef struct dig_cred {
    str username;         /* Username */
    str realm;            /* Realm */
    str nonce;            /* Nonce value */
    str uri;              /* URI */
    str response;         /* Response string */
    str algorithm;        /* Algorithm in string representation */
    struct algorithm alg; /* Type of algorithm used */
    str cnonce;           /* Cnonce value */
    str opaque;           /* Opaque data string */
    struct qp qop;        /* Quality Of Protection */
    str nc;               /* Nonce count parameter */
} dig_cred_t;

This structure represents set of digest credentials parameters. Description of field follows:

  • username - Value of "username" parameter.

  • realm - Value of "realm" parameter.

  • nonce - Value of "nonce" parameter.

  • uri - Value of "uri" parameter.

  • response - Value of "response" parameter.

  • algorithm - Value of "algorithm" parameter as string.

  • alg - Parsed value of "algorithm" parameter.

  • cnonce - Value of "cnonce" parameter.

  • opaque - Value of "opaque" parameter.

  • qop - Value of "qop" parameter.

  • nc - Value of "nc" parameter.

Other Functions Of the Digest Body Parser

There are some other mainly convenience functions defined in the parser. The function will be in detail described in this section. All the functions are defined in digest.c file.

This function performs some basic sanity check over parsed digest credentials. The following conditions must be met for the checks to be successfull:

Note

It is recommended to call check_dig_cred before you try to authorize the credentials. If the function fails, there is no need to try to authorize the credentials because the authorization will fail for sure.

This is convenience function. The function saves pointer to the authorized credentials. For more info see description of authorized field in auth_body structure.

This is convenience function. The function will retrieve pointer to authorized credentials previously saved using mark_authoized_cred function. If there is no such credentials, 0 will be stored in variable pointed to by the second parameter. The function returns always zero. For more information see description of authorized field in auth_body structure.