Reputation: 25
I captured tcp data in Wireshark and export the data to csv and now I am trying to group the tcp packets per flow, using python but I'm not sure how to do it.
if Source, Src Port, Destination, Dest Port is the same across the row forward and backward it's considered apart of the same flow i.e. A->B and B->A
in the example below there are two flow:
Source Src Port Destination Dest Port
10.129.200.119 49298 17.248.144.77 443
10.129.200.119 49299 17.253.37.210 80
No. Time Source Src Port Destination Dest Port Protocol Length Flags
37 12.045906 10.129.200.119 49298 17.248.144.77 443 TCP 54 0x010
38 12.04922 17.248.144.77 443 10.129.200.119 49298 TCP 66 0x010
39 13.634783 10.129.200.119 49298 17.248.144.77 443 TLSv1.2 112 0x018
40 13.635868 10.129.200.119 49298 17.248.144.77 443 TLSv1.2 97 0x018
41 13.636239 10.129.200.119 49298 17.248.144.77 443 TCP 66 0x011
42 13.640724 17.248.144.77 443 10.129.200.119 49298 TCP 66 0x010
43 13.640731 17.248.144.77 443 10.129.200.119 49298 TCP 66 0x011
44 13.640732 17.248.144.77 443 10.129.200.119 49298 TCP 66 0x010
45 13.640852 10.129.200.119 49298 17.248.144.77 443 TCP 66 0x011
47 14.472724 10.129.200.119 49299 17.253.37.210 80 TCP 78 0x0c2
48 14.478233 17.253.37.210 80 10.129.200.119 49299 TCP 74 0x052
50 14.478405 10.129.200.119 49299 17.253.37.210 80 TCP 66 0x010
51 14.479316 10.129.200.119 49299 17.253.37.210 80 HTTP 361 0x018
52 14.483419 17.253.37.210 80 10.129.200.119 49299 TCP 66 0x010
53 14.483425 17.253.37.210 80 10.129.200.119 49299 TCP 1514 0x010
54 14.483427 17.253.37.210 80 10.129.200.119 49299 TCP 1514 0x010
55 14.48343 17.253.37.210 80 10.129.200.119 49299 OCSP 319 0x018
56 14.48355 10.129.200.119 49299 17.253.37.210 80 TCP 66 0x010
57 14.483551 10.129.200.119 49299 17.253.37.210 80 TCP 66 0x010
58 14.486264 10.129.200.119 49299 17.253.37.210 80 TCP 66 0x011
59 14.490827 17.253.37.210 80 10.129.200.119 49299 TCP 66 0x011
60 14.490914 10.129.200.119 49299 17.253.37.210 80 TCP 66 0x010
Upvotes: 0
Views: 1776
Reputation: 576
I would recommend to export the data from wireshark to .json format, there is a better way to group tcp session using information that isn't exported to the csv format. In order to do make a json file from your pcap do: File->Export Packet Dissection->AS JSON...
After you do so, you can look at the field tcp.stream
, it has the same value for tcp stream ("flow").
Then you can use this code in order to go over the packet, and search for specific tcp.stream
value:
import json
with open('path_to_your_json.json') as json_file:
packets = json.load(json_file)
count = 0
for packet in packets:
layers = packet["_source"]['layers']
if "tcp" in layers:
if layers["tcp"]["tcp.stream"]=="11":
count=count+1
print(count)
this code for example, follow all the tcp packets that are in stream number 11, and count them.
In order to work efficently and understand what you are doing, I recommend that you open the json file in text editor (like sublime), and see what it contains and the hierarchy of things. In addition, I would recommend to read about json in python: w3schools python and json
Upvotes: 1
Reputation: 5757
May be you can try pandas. Below snippet. groups the rows of data according to the source ip address.
I am not familiar with what you mean by flow. I am assuming it means according to the source and destination ip pairs.
import pandas as pd
with open('data.txt') as f:
lines = f.readlines()
data = []
for line in lines:
tokens = line.split()
data.append(tokens)
df = pd.DataFrame(data, columns=list("ABCDEFGHI"))
print(df)
grouped_df = df.groupby('C', as_index=False)
for key, item in grouped_df:
print(grouped_df.get_group(key), "\n\n")
gives so an output
[8 rows x 9 columns]
A B C D ... F G H I
0 37 12.045906 10.129.200.119 49298 ... 443 TCP 54 0x010
2 39 13.634783 10.129.200.119 49298 ... 443 TLSv1.2 112 0x018
3 40 13.635868 10.129.200.119 49298 ... 443 TLSv1.2 97 0x018
4 41 13.636239 10.129.200.119 49298 ... 443 TCP 66 0x011
[4 rows x 9 columns]
A B C D E F G H I
1 38 12.04922 17.248.144.77 443 10.129.200.119 49298 TCP 66 0x010
5 42 13.640724 17.248.144.77 443 10.129.200.119 49298 TCP 66 0x010
6 43 13.640731 17.248.144.77 443 10.129.200.119 49298 TCP 66 0x011
7 44 13.640732 17.248.144.77 443 10.129.200.119 49298 TCP 66 0x010
Upvotes: 0
Reputation: 1285
You can use pandas to do this. If you rename your columns Src Port
to Src_Port
and Dest Port
to Dest_Port
.
Assuming that the pair of ['Source', 'Src_Port', 'Destination', 'Dest_Port', 'Protocol']
is 'flow' (I am by no means a domain expert) and your data is in 'wireshark_dump.csv', you can do the following
import pandas as pd
df = pd.read_csv('wireshark_dump.csv', delim_whitespace=True)
flow_columns = ['Source', 'Src_Port', 'Destination', 'Dest_Port', 'Protocol']
for flow, flow_data in df.groupby(flow_columns):
print(flow)
print(flow_data)
Note that depending on what your further processing looks like, you might not want to iterate over the groupby groups as it is slow.
Upvotes: 0