Reputation: 21
I have a ip packet as a CSV file that I'm trying to extract sequence numbers from info field to a separate column that only has the sequence numbers. Sequence numbers are substring in the middle of info string. So here is my raw code.. First I create a new column fo sequence numbers, then I check if Info field contains a Seq number, then I split the info field so I only get the sequence number out. If I print after 'Seq = j.split...', I do get the correct values. How do I write it to the CSV file Seq column?
file = pd.read_csv(file.csv)
file['Seq'] = None
for i in file['Info']:
if 'Seq' in i:
split = i.split(' ')
for j in split:
if 'Seq=' in j:
Seq = j.split('Seq=',1)[1]
file.loc[i,'Seq'] = int(Seq)
Example CSV:
No. Time Source Destination Protocol Length Info
1 0.000000 sourceip 192.168.0.1 TCP 54 35165 > 80 [SYN] Seq=0 Win=16384 Len=0
2 0.000001 sourceip 192.168.0.1 TCP 54 14378 > 80 [SYN] Seq=0 Win=16384 Len=0
3 0.000003 sourceip 192.168.0.1 TCP 54 31944 > 80 [SYN] Seq=0 Win=16384 Len=0
Desired outcome:
No. Time Source Destination Protocol Length Info Seq
1 0.000000 sourceip 192.168.0.1 TCP 54 35165 > 80 [SYN] Seq=0 Win=16384 Len=0 0
2 0.000001 sourceip 192.168.0.1 TCP 54 14378 > 80 [SYN] Seq=0 Win=16384 Len=0 0
3 0.000003 sourceip 192.168.0.1 TCP 54 31944 > 80 [SYN] Seq=0 Win=16384 Len=0 0
Upvotes: 2
Views: 88
Reputation: 294218
Use str.extract
file['Seq'] = file.Info.str.extract('Seq=(\d+)', expand=False).astype(float)
Upvotes: 2