Create Boolean Columns in Pandas Dataframe using Dictionary

Question

I am using a network trace dataset, and have loaded the initial data into a pandas dataframe, which looks like this:

I have created a python dict with common port numbers and applications names like

port_dict = {80: 'http', 20: 'ftp', 21: 'ftp'}

and I want to modify my dataframe by adding additional columns whose names will be the unique values of the ports_dict and if either of sport or dport contains the relevant key, the newly added column should have a value True, False otherwise, like this:

In the above picture, the column https should have True as the sport is 443.

How would I go about accomplishing this?

chrisb · Accepted Answer

Try this out. Series.map should be a faster way to look up values from the dictionary. pandas.get_dummies turns a single column of data into columns for each distinct value as 1s / 0s, which I'm converting into a bool, and compare with or (|) to get whether the service was on either port.

service = pd.get_dummies(df['sport'].map(port_dict)).astype(bool) | pd.get_dummies(df['sport'].map(port_dict)).astype(bool)

df[services.columns] = services

In [166]: df.head()
Out[166]: 
   dport  sport    ftp   http
0      1      1  False  False
1     80      2  False  False
2      2     80  False   True
3      3     20   True  False
4      1      1  False  False

Create Boolean Columns in Pandas Dataframe using Dictionary

Answers (2)

Related Questions