Geesh_SO
Geesh_SO

Reputation: 2205

How can I most effectively parse this HTTP request in C?

The information I really need to extract is:

a) Whether or not it is a GET request

b) The file address (e.g. index.html)

c) The host information (e.g. localhost:8081)

I have code to do this just now (see bottom of my post), but it seems inefficient, quite static, and doesn't pull the host information.

So I'd like to have a sane solution to parsing the HTTP request in C. Cheers!

HTTP Request

GET /index.html HTTP/1.1
Host: localhost:8081
Connection: keep-alive
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.17 (KHTML, like Gecko) Chrome/24.0.1312.70 Safari/537.17
DNT: 1
Accept-Encoding: gzip,deflate,sdch
Accept-Language: en-US,en;q=0.8,en-GB;q=0.6
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.3

Current Code

int parsehttp(char *inputstring, int *type, char *getaddress) {
    if((strncmp(inputstring, "GET", 3)) == 0) {
        *type = 1;
    } else {
        *type = 0;
    }
    char firstline[BUFLEN] = "";
    int charoffset = getlineend(inputstring); //this function returns the int offset of '\r\n'
    strncpy(firstline, inputstring, charoffset-2);
    firstline[charoffset-1] = '\0';
    sscanf(firstline,"%*s %s %*s",getaddress);
    inputstring = (inputstring + charoffset);
    return 1;
}

Upvotes: 2

Views: 10850

Answers (2)

Bernd Elkemann
Bernd Elkemann

Reputation: 23560

You should not worry about this being inefficient, it is networking after all and will always be many magnitudes slower than your CPU, cache, RAM.

If you are writing an http-server then the only thing you should care about is memory-safety and what your code does if the client sends something unexpected.

Some examples: What does your code (and the code that follows this / is dependent upon its parsing) do if:

  • the client sends > 10 MB of data, all mal-formed, eg no line-breaks at all.
  • the client sends wrong decimals (ie ip/port/content-length)
  • the client sends correct data but malicously slowly, eg 1 byte per second.
  • ... much much more.

Upvotes: 0

bash.d
bash.d

Reputation: 13217

What might help you, is the strstr-function. It tries to locate a given string in a string you provide. As an HTTP-request consists of a line ending in 0xD,0xA you can split the lines. Usually information on a line of text is separated using whitespace. So to find "GET" or "POST" you use

char* getpost = strstr("GET /index.html HTTP/1.1", "GET");

If getpost is != NULL, you'll have your string and can cut it after either GET or POST.

Secondly you'll look for "Host: " and skipt that part until you reach 0xD,0xA so you got your host-address.

See strstr for the manpage on strstr.

Upvotes: 3

Related Questions