Reputation: 3354
Currently I am parsing HTTP headers from pcap files like so:
f = file(sys.argv[1],"rb") # pass in pcap file as argument to script
fout = open("path to header output file", "a")
pcap = dpkt.pcap.Reader(f)
# master holds string to write
master = ""
print "Working ..."
for ts, buf in pcap:
l2 = dpkt.ethernet.Ethernet(buf)
if l2.type == 2048: #only for IP (ip id 2048), no ARP
l3=l2.data
if l3.p == dpkt.ip.IP_PROTO_TCP: #IP TCP
l4=l3.data
if l4.dport==80 and len(l4.data)>0:
try:
http=dpkt.http.Request(l4.data)
dict_headers = http.headers
http_method = http.method
http_uri = http.uri
http_body = http.body
http_version = http.version
# this is for first line, method + uri, e.g. GET URI
master += unicode( http_method + ' ' + http_uri + ' ' + 'HTTP/' + http_version + '\n','utf-8')
for key,val in dict_headers.iteritems():
master += unicode( key + ': ' + val + '\n', 'utf-8')
master += '\n'
except:
master += unicode( l4.data, 'utf-8')
continue
# perform writing and closing of files, etc
THe problem is, dpkt stores http fields in a dictionary (http.headers), which is unordered. I need to retain the order of the fields. Is there any way around this?
Upvotes: 0
Views: 1109
Reputation: 115
Two options:
You can alter the code of dpkt to use OrderedDict instead of regular dictionary (didn't try that). OrderedDict preserves the order of insertion.
Parse the HTTP request by yourself, each header value ends with \x0d\x0a. Each header name has ':' at its end, so you can use split and make a list (ordered) of the headers this way:
l5 = l4.data
headers_and_content = l5[l5.index('\x0d\x0a')+2:l5.index('\x0d\x0a\x0d\x0a')].split('\x0d\x0a')
ordered_headers = []
for item in headers_and_content:
ordered_headers.append(item[:item.index(':')])
Upvotes: 1