daisy
daisy

Reputation: 23501

What's the right way to parse HTTP packets?

I'm creating a simple HTTP server, that needs to understand HTTP requests,

But browsers like chromium utilizes HTTP Pipelining technology, which means in a single connection multiple HTTP request can be sent.

Now I find it hard to tell the boundary between multiple HTTP request, one obvious example is that a GET request plus a form upload of random data.

What I think of right now, is split all the data I received with \r\n, then check every line, see if it looks like a HTTP request, e.g ^(GET|PUT|HEAD|POST|MOVE|TRACE) /[^ ]+ HTTP/[0-9]+\.[0-9]+$

But that could still be wrongful, any ideas? (Please don't tell me to use an existing HTTP server library ... I'm practicing something)

Upvotes: 1

Views: 2175

Answers (2)

Adam Rosenfield
Adam Rosenfield

Reputation: 400204

Take a good read through RFC 2616, the specification for the HTTP protocol. An HTTP request consists of these pieces:

  1. Start line
  2. Zero or more header lines
  3. Empty line
  4. Request body

You start by parsing the start line, which involves reading until the first newline (carriage return and linefeed, CRLF). Then, you read the headers by reading lines until you read an empty line (i.e. two consecutive CRLF pairs).

Once you've read the headers, you can determine if there's a request body or not by seeing if you got a Content-Length and/or Transfer-Encoding header. If you did get either of those, then those tell you how long the request body is, and then you read that much data (this may require multiple reads, e.g. with the chunked transfer-encoding).

After you've read the request body, you're done! You're then ready to read the next request.

Upvotes: 7

The way to parse HTTP requests if you are unwilling to use an existing library is to read the RFCs that specify the format of HTTP requests, and then write code to parse data in that format.

Also note that HTTP Pipelining and any ability to submit multiple requests through a single connection are HTTP 1.1 features: You never have to accept that. Your server is certainly permitted to read a single request, send an HTTP/1.0 response, and close the connection. Any web browser will be expected to accept that gracefully.

Upvotes: 1

Related Questions