Reputation: 107
I have two issues I am trying to resolve,
I want to check in the dictionary frequency4
element by element for each ip address after it gets stored, if that ip address is in column[4]
in the lines of data in the text file it will keep adding the amount of bytes of that exact ip in the data file.
If the column[8]
under bytes
contains an "M" meaning million, it will convert that M into '*1000000' equaling 33000000 (see data from text file below) keep in mind that this is a sample of the text file, the text file contains thousands of lines of data.
The output I am looking for is:
Total bytes for ip 172.217.9.133 is 33000000
Total bytes for ip 205.251.24.253 is 9516
Total bytes for ip 52.197.234.56 is 14546
CODE
from collections import OrderedDict
from collections import Counter
frequency4 = Counter({})
ttlbytes = 0
with open('/Users/rm/Desktop/nettestWsum.txt', 'r') as infile:
next(infile)
for line in infile:
if "Summary:" in line:
break
try:
srcip = line.split()[4].rsplit(':', 1)[0]
frequency4[srcip] = frequency4.get(srcip,0) + 1
f4 = OrderedDict(frequency4.most_common())
for srcip in f4:
ttlbytes += int(line.split()[8])
except(ValueError):
pass
print("\nTotal bytes for ip",srcip, "is:", ttlbytes)
for srcip, count in f4.items():
print("\nIP address from destination:", srcip, "was found:", count, "times.")
DATA FILE
Date first seen Duration Proto Src IP Addr:Port Dst IP Addr:Port Packets Bytes Flows
2017-04-11 07:23:17.880 929.748 UDP 172.217.9.133:443 -> 205.166.231.250:41138 3019 3.3 M 1
2017-04-11 07:38:40.994 6.676 TCP 205.251.24.253:443 -> 205.166.231.250:24723 16 4758 1
2017-04-11 07:38:40.994 6.676 TCP 205.251.24.253:443 -> 205.166.231.250:24723 16 4758 1
2017-04-11 07:38:41.258 6.508 TCP 52.197.234.56:443 -> 205.166.231.250:13712 14 7273 1
2017-04-11 07:38:41.258 6.508 TCP 52.197.234.56:443 -> 205.166.231.250:13712 14 7273 1
Summary: total flows: 22709, total bytes: 300760728, total packets: 477467, avg bps: 1336661, avg pps: 265, avg bpp: 629
Time window: 2017-04-11 07:13:47 - 2017-04-11 07:43:47
Total flows processed: 22709, Blocks skipped: 0, Bytes read: 1544328
Sys: 0.372s flows/second: 61045.7 Wall: 0.374s flows/second: 60574.9
Upvotes: 0
Views: 53
Reputation: 303
Ok I'm not sure if you need to edit the same file..if you're just looking to process the data and view it, you can explore using pandas as it has many functions that quicken data processing.
import pandas as pd
df = pd.read_csv(filepath_or_buffer = '/Users/rm/Desktop/nettestWsum.txt', index_col = False, header = None, skiprows = 1, sep = '\s\s+', skipfooter = 4)
df.drop(labels = 3, axis = 1, inplace = True)
# To drop the -> column
columnnames = 'Date first seen,Duration Proto,Src IP Addr:Port,Dst IP Addr:Port,Packets,Bytes,Flows'
columnnames = columnnames.split(',')
df.columns = columnnames
This loads the data into a nice dataframe (table). I would suggest you read up on the documentation of the pandas.read_csv method here. To process the data, you can try the below.
# converting data with 'M' to numeric data in millions
df['Bytes'] = df['Bytes'].apply(lambda x: float(x[:-2])*1000000 if x[-1] == 'M' else x)
df['Bytes'] = pd.to_numeric(df['Bytes'])
result = df.groupby(by = 'Dst IP Addr:Port').sum()
Your data will come out in a nice dataframe (table) that you can use. It is faster than looping through I think, you can do the testing separately. Below is how the data looks like after being loaded.
Below is the output of the groupby, which you can tweak. I'm using the Spyder IDE and the screengrabs are from the variable explorer in the IDE. You can visualize it by printing the dataframe out or saving it as another CSV.
Upvotes: 0
Reputation: 196
i don't know what you need the frequency for but given your input here's how to get the desired output:
from collections import Counter
count = Counter()
with open('/Users/rm/Desktop/nettestWsum.txt', 'r') as infile:
next(infile)
for line in infile:
if "Summary:" in line:
break
parts = line.split()
srcip = parts[4].rsplit(':', 1)[0]
multiplier = 10**6 if parts[9] == 'M' else 1
bytes = int(float(parts[8]) * multiplier)
count[srcip] += bytes
for srcip, bytes in count.most_common():
print('Total bytes for ip', srcip, 'is', bytes)
Upvotes: 1