abdi must
abdi must

Reputation: 25

How to group Wireshark TCP packets per flow using Python

I captured tcp data in Wireshark and export the data to csv and now I am trying to group the tcp packets per flow, using python but I'm not sure how to do it.

if Source, Src Port, Destination, Dest Port is the same across the row forward and backward it's considered apart of the same flow i.e. A->B and B->A

in the example below there are two flow:

Source          Src Port    Destination     Dest Port
10.129.200.119  49298       17.248.144.77   443 
10.129.200.119  49299       17.253.37.210   80
No. Time    Source  Src Port    Destination Dest Port   Protocol    Length  Flags
37  12.045906   10.129.200.119  49298   17.248.144.77   443 TCP 54  0x010
38  12.04922    17.248.144.77   443 10.129.200.119  49298   TCP 66  0x010
39  13.634783   10.129.200.119  49298   17.248.144.77   443 TLSv1.2 112 0x018
40  13.635868   10.129.200.119  49298   17.248.144.77   443 TLSv1.2 97  0x018
41  13.636239   10.129.200.119  49298   17.248.144.77   443 TCP 66  0x011
42  13.640724   17.248.144.77   443 10.129.200.119  49298   TCP 66  0x010
43  13.640731   17.248.144.77   443 10.129.200.119  49298   TCP 66  0x011
44  13.640732   17.248.144.77   443 10.129.200.119  49298   TCP 66  0x010
45  13.640852   10.129.200.119  49298   17.248.144.77   443 TCP 66  0x011
47  14.472724   10.129.200.119  49299   17.253.37.210   80  TCP 78  0x0c2
48  14.478233   17.253.37.210   80  10.129.200.119  49299   TCP 74  0x052
50  14.478405   10.129.200.119  49299   17.253.37.210   80  TCP 66  0x010
51  14.479316   10.129.200.119  49299   17.253.37.210   80  HTTP    361 0x018
52  14.483419   17.253.37.210   80  10.129.200.119  49299   TCP 66  0x010
53  14.483425   17.253.37.210   80  10.129.200.119  49299   TCP 1514    0x010
54  14.483427   17.253.37.210   80  10.129.200.119  49299   TCP 1514    0x010
55  14.48343    17.253.37.210   80  10.129.200.119  49299   OCSP    319 0x018
56  14.48355    10.129.200.119  49299   17.253.37.210   80  TCP 66  0x010
57  14.483551   10.129.200.119  49299   17.253.37.210   80  TCP 66  0x010
58  14.486264   10.129.200.119  49299   17.253.37.210   80  TCP 66  0x011
59  14.490827   17.253.37.210   80  10.129.200.119  49299   TCP 66  0x011
60  14.490914   10.129.200.119  49299   17.253.37.210   80  TCP 66  0x010

Upvotes: 0

Views: 1776

Answers (3)

TomE8
TomE8

Reputation: 576

I would recommend to export the data from wireshark to .json format, there is a better way to group tcp session using information that isn't exported to the csv format. In order to do make a json file from your pcap do: File->Export Packet Dissection->AS JSON...

After you do so, you can look at the field tcp.stream, it has the same value for tcp stream ("flow").

Then you can use this code in order to go over the packet, and search for specific tcp.stream value:

import json

with open('path_to_your_json.json') as json_file:
    packets = json.load(json_file)

    count = 0
    for packet in packets:
        layers = packet["_source"]['layers']
        if "tcp" in layers:
            if layers["tcp"]["tcp.stream"]=="11":
                count=count+1
    print(count)

this code for example, follow all the tcp packets that are in stream number 11, and count them.

In order to work efficently and understand what you are doing, I recommend that you open the json file in text editor (like sublime), and see what it contains and the hierarchy of things. In addition, I would recommend to read about json in python: w3schools python and json

Upvotes: 1

abhilb
abhilb

Reputation: 5757

May be you can try pandas. Below snippet. groups the rows of data according to the source ip address.

I am not familiar with what you mean by flow. I am assuming it means according to the source and destination ip pairs.

import pandas as pd

with open('data.txt') as f:
    lines = f.readlines()
    data = []
    for line in lines:
        tokens = line.split()
        data.append(tokens)
    df = pd.DataFrame(data, columns=list("ABCDEFGHI"))
    print(df)
    grouped_df = df.groupby('C', as_index=False)
    for key, item in grouped_df:
        print(grouped_df.get_group(key), "\n\n")

gives so an output

[8 rows x 9 columns]
    A          B               C      D  ...    F        G    H      I
0  37  12.045906  10.129.200.119  49298  ...  443      TCP   54  0x010
2  39  13.634783  10.129.200.119  49298  ...  443  TLSv1.2  112  0x018
3  40  13.635868  10.129.200.119  49298  ...  443  TLSv1.2   97  0x018
4  41  13.636239  10.129.200.119  49298  ...  443      TCP   66  0x011

[4 rows x 9 columns] 


    A          B              C    D               E      F    G   H      I
1  38   12.04922  17.248.144.77  443  10.129.200.119  49298  TCP  66  0x010
5  42  13.640724  17.248.144.77  443  10.129.200.119  49298  TCP  66  0x010
6  43  13.640731  17.248.144.77  443  10.129.200.119  49298  TCP  66  0x011
7  44  13.640732  17.248.144.77  443  10.129.200.119  49298  TCP  66  0x010 

Upvotes: 0

BStadlbauer
BStadlbauer

Reputation: 1285

You can use pandas to do this. If you rename your columns Src Port to Src_Port and Dest Port to Dest_Port.

Assuming that the pair of ['Source', 'Src_Port', 'Destination', 'Dest_Port', 'Protocol'] is 'flow' (I am by no means a domain expert) and your data is in 'wireshark_dump.csv', you can do the following

import pandas as pd


df = pd.read_csv('wireshark_dump.csv', delim_whitespace=True)

flow_columns = ['Source', 'Src_Port', 'Destination', 'Dest_Port', 'Protocol']
for flow, flow_data in df.groupby(flow_columns):
    print(flow)
    print(flow_data)

Note that depending on what your further processing looks like, you might not want to iterate over the groupby groups as it is slow.

Upvotes: 0

Related Questions