jeffrey
jeffrey

Reputation: 3354

How to parse pcap headers in python while retaining header field order

Currently I am parsing HTTP headers from pcap files like so:

f = file(sys.argv[1],"rb") # pass in pcap file as argument to script
fout = open("path to header output file", "a")
pcap = dpkt.pcap.Reader(f)

# master holds string to write
master = ""
print "Working ..."
for ts, buf in pcap:
  l2 = dpkt.ethernet.Ethernet(buf)
  if l2.type == 2048: #only for IP (ip id 2048), no ARP
    l3=l2.data
    if l3.p == dpkt.ip.IP_PROTO_TCP: #IP TCP
      l4=l3.data
      if l4.dport==80 and len(l4.data)>0:
        try:
          http=dpkt.http.Request(l4.data)
          dict_headers = http.headers
          http_method = http.method 
          http_uri = http.uri
          http_body = http.body
          http_version = http.version

          # this is for first line, method + uri, e.g. GET URI
          master += unicode( http_method + ' ' +  http_uri + ' ' + 'HTTP/' +  http_version + '\n','utf-8')

          for key,val in dict_headers.iteritems():
            master += unicode( key + ': ' + val + '\n', 'utf-8')

          master += '\n'
        except:
          master += unicode( l4.data, 'utf-8')
          continue

# perform writing and closing of files, etc

THe problem is, dpkt stores http fields in a dictionary (http.headers), which is unordered. I need to retain the order of the fields. Is there any way around this?

Upvotes: 0

Views: 1109

Answers (1)

Hod Gavriel
Hod Gavriel

Reputation: 115

Two options:

  1. You can alter the code of dpkt to use OrderedDict instead of regular dictionary (didn't try that). OrderedDict preserves the order of insertion.

  2. Parse the HTTP request by yourself, each header value ends with \x0d\x0a. Each header name has ':' at its end, so you can use split and make a list (ordered) of the headers this way:

    l5 = l4.data
    headers_and_content = l5[l5.index('\x0d\x0a')+2:l5.index('\x0d\x0a\x0d\x0a')].split('\x0d\x0a')
    ordered_headers = []
    for item in headers_and_content:
        ordered_headers.append(item[:item.index(':')])
    

Upvotes: 1

Related Questions