Reputation: 2270

Python2 and Python3 DPKT appears to return different output formats

The DPKT library says it supports Python3 now, but it has different behavior when I use it in Python 2.x vs 3.x. Although, both are incorrect it appears.

For example, in Python 2.x, the example given here

with open('test.pcap') as f:
    pcap = dpkt.pcap.Reader(f)
    for ts, buf in pcap:
       eth = dpkt.ethernet.Ethernet(buf)
       print eth

Returns a format that I don't expect, an object similar to:

   ^����6#���l�m�
Q!6�(�����k����~�pO���o���N�l   �k4�'���8�9�j��@mf���5��pB�6bٌ�~p��Jf.Jܼ3H�:�ݭ�k-O7+�O��
4�(�9��^F�fb��V��t˜������\�X1��#�.�ج<�Q�!����>�^ɹDĀ�orC=bC���S�6;��SR�`�� �

ZD����j2Q���m����h��)1@��1���aw}�d�ڧn�                                          ��
0Z:�`8ຄE(�@4���}������Mu��63fP�/�
������h'7�h'7�;������

However, in Python 3, I'm forced to open the pcap file in 'rb' mode, which is fine, except for the output issues (I'm not sure 'rb' has anything to do with the issues now):

with open('test.pcap', 'rb') as f:
    pcap = dpkt.pcap.Reader(f)
    for ts, buf in pcap:
       eth = dpkt.ethernet.Ethernet(buf)
       print eth

This now returns what I believe is a bytestring, and I haven't found a way to get the data out of this that I need. For example, if I needed the number of flags, I can easily get 17 from the above example from their site, but I can't seem to get their example to work at all:

b'\x00\x0f\x1f\x16\xd1\xcd\x00\xc0\xf0y\x9a\xfd\x08\x00E\x00\x00\x1c\xb1\xce\x00\x006\x01N\xf7\xc0\xa8\x01d\xc0\xa8\x01g\x08\x00\xd9\xd7\xb7\xc4fc'

I haven't had any luck converting this string into a human readable object. No combination of decode, binascii or anything else I've tried has worked. Am I using this library incorrectly?

Upvotes: 0

Answers (2)

Shevach Riabtsev

Reputation: 495

try open the pcap-file as binary 'with open('test.pcap','rb')'

Upvotes: 0

Gil Hamilton

Reputation: 12347

One of the major differences between python2 and python3 is that in python3, str and bytes are no longer the same. Compare:

$ python2 -c 'print(b"foo" == "foo")'
True

$ python3 -c 'print(b"foo" == "foo")'
False

This explains why you must open the file with "rb" in python3. (Although it's quite likely that you would get bogus results if you didn't do so on some platforms with python2, because without the b line endings that happen to exist in the file may get expanded inappropriately.)

Another difference: in python3, print is a function, not a statement so the code you've shown above for python3 is actually a syntax error. Instead you need print(eth)

To answer your actual question: When you simply print eth, you are implicitly asking the eth object to make itself printable. That is the same as calling print(str(eth)) and so it's giving you a printable string version of the binary data buffer that contains the ethernet frame.

You need to use the facilities of dpkt to discover, then dissect the parts of the frame that you care about.

Here's a short example that decodes a pcap containing DNS packets:

import dpkt
with open("/tmp/dns.pcap", "rb") as f:
    pcap = dpkt.pcap.Reader(f)
    for ts, buf in pcap:
        l2 = dpkt.ethernet.Ethernet(buf)
        print("Ethernet (L2) frame:", repr(l2))

        if l2.type not in (dpkt.ethernet.ETH_TYPE_IP, dpkt.ethernet.ETH_TYPE_IP6):
            print("Not an IP packet")
            continue
        l3 = l2.data
        print("IP packet:", repr(l3))

        if l3.p not in (dpkt.ip.IP_PROTO_TCP, dpkt.ip.IP_PROTO_UDP):
            print("Not TCP or UDP")
            continue

        l4 = l3.data
        print("Layer 4:", repr(l4))

        if l4.dport in (53, 5353) or l4.sport in (53, 5353):
            dns = l4.data
            if not isinstance(dns, dpkt.dns.DNS):
                dns = dpkt.dns.DNS(dns)
            print("DNS packet:", repr(dns))

As for why your output looks different than the tutorial. The tutorial is out of date. Apparently at some point, the implementation of the __str__ magic method on the dpkt objects changed (when you just print an object, you get the result of its __str__ method).

Originally, __str__ returned a formatted representation of the object. Later it just returns a string representation of the raw bytes of the object. So now you need to call repr(obj) in order to get the formatted representation.

Upvotes: 3

Python2 and Python3 DPKT appears to return different output formats

Answers (2)

Related Questions