Reputation: 89
I am trying to find the integer for content length in a headers file. We are currently using something that works for some websites, but will not work for larger files (over 9999 bytes). How could we find the newline at the end of that specific header so that we can get exactly the integer for content length?
content_length = headers[headers.find("Content-Length: ")+16:headers.find("Content-Length: ")+20]
I would try searching for the "\r\n" but that is problematic as there are many in the http headers. Unfortunately, we do not have access to urllib.
Example headers:
GET http://example.com/ HTTP/1.1\r\n
Content-Length: 95972\r\n
Keep-Alive: 300\r\n
Connection: keep-alive\r\n\r\n
Upvotes: 0
Views: 1716
Reputation: 106
Simple. Assuming your headers are stored to a string:
for line in headers:
if "Content-Length:" in line:
contentLength = line
contentLength = contentLength[16:]
Caveats: Not all HTTP headers contain Content-Length and some headers may contain more than one Content-Length.
Upvotes: 1
Reputation: 343
Headers ends with "\r\n" not alone "\n". Please read RFC for HTTP.
So, you should read Content-Length header until you hit character "\r", then you can check next character will be "\n" for confirmation.
Or regular expression can be: "Content-Length:\s+\d+\s+"
EDIT Yes there can be many "\r\n" in body, but you don't have to parse all, you just have to iterate over header lines which are separated with "\r\n", and get line that starts as "Content-Length", that's what you are looking for. Additionally, the HTTP message body will start after "\r\n\r\n".
Upvotes: 0