Reputation: 143
I am using a network trace dataset, and have loaded the initial data into a pandas dataframe, which looks like this:
I have created a python dict with common port numbers and applications names like
port_dict = {80: 'http', 20: 'ftp', 21: 'ftp'}
and I want to modify my dataframe by adding additional columns whose names will be the unique values of the ports_dict
and if either of sport
or dport
contains the relevant key, the newly added column should have a value True
, False
otherwise, like this:
In the above picture, the column https
should have True
as the sport is 443
.
How would I go about accomplishing this?
Upvotes: 0
Views: 5381
Reputation: 52236
Try this out. Series.map
should be a faster way to look up values from the dictionary. pandas.get_dummies
turns a single column of data into columns for each distinct value as 1s / 0s, which I'm converting into a bool, and compare with or (|
) to get whether the service was on either port.
service = pd.get_dummies(df['sport'].map(port_dict)).astype(bool) | pd.get_dummies(df['sport'].map(port_dict)).astype(bool)
df[services.columns] = services
In [166]: df.head()
Out[166]:
dport sport ftp http
0 1 1 False False
1 80 2 False False
2 2 80 False True
3 3 20 True False
4 1 1 False False
Upvotes: 2
Reputation: 81594
If I may suggest that you will simply have a service
column, then if the sport
or dport
are in the port_dict
keys then the value will be written in the service
column:
port_dict = {80: 'http', 20: 'ftp', 21: 'ftp'}
df = pd.DataFrame(data={'sport':[1, 2, 80, 20], 'dport':[1, 80, 2, 3]})
for i in df.index:
found_service = port_dict.get(df.ix[i, 'sport'], False) or port_dict.get(df.ix[i, 'dport'], False)
df.at[i, 'service'] = found_service
# a small example dataframe
>> dport sport service
0 1 1 False
1 80 2 http
2 2 80 http
3 3 20 ftp
Upvotes: 1