Patrick Dijkhorst
Patrick Dijkhorst

Reputation: 13

How to one hot encode a list of different "columns" to a dataframe

I need to prepare my data for modelling and I want to create a dataframe with 0-1 values for the columns. I have a list with different columns which i want to one hot encode into a dataframe.

List = [['DRT', 'AFV'], ['CLN', 'DRT', 'AFV'], ['CLN', 'DRT', 'AFV'], ['BLN', 'PCK', 'CAL', 'WBL', 'BCO', 'UPG', 'CLN', 'DRT'], ['BLN', 'AFV', 'CAL', 'WBL', 'UPG', 'CLN', 'DRT'], ['COA', 'BLN', 'PCK', 'CAL', 'WBL', 'UPG', 'CLN', 'DRT'], ['COA', 'BLN', 'PCK', 'CAL', 'WBL', 'UPG', 'CLN', 'DRT']]

I want to have a dataframe as shown below with 1 values for the items in the list and 0 values that are not in the list, and then different rows for each list in this list. There are a total of 28 different values that can be in the list.

[![df][1]][1]

I tried "get_dummies" but this creates different columns like 1_DRT ... 7_DRT because of the different locations of DRT in the dataframe. Also tried using some functions from Scikitlearn but without succes. Would really appreciate some help with this one.

Edit: Columns of the eventual dataframe with the 0-1 values -->

columns=['CLN', 'AFV', 'DRT', 'CAL', 'WBL', 'BLN', 'UPG', 'BCO', 'PCK', 'COA', 'WPK', 'WCO', '1CL', 'DRY', 'RES', 'WFR', 'FRZ', 'REC', 'CHF', 'STP', 'DFR', 'HOT', 'EXT', 'PIL', 'SPL', 'INS', 'SVT', 'UVP'] [1]: https://i.sstatic.net/nuUp9.png

Upvotes: 1

Views: 125

Answers (1)

SeaBean
SeaBean

Reputation: 23227

You can create a Pandas Series for List and .explode() the list into different rows and then use .str.get_dummies() to get the dummy table for each explode row. Aggregate the rows of original list by .max(level=0):

df = pd.Series(List).explode().str.get_dummies().max(level=0)

Result:

print(df)

   AFV  BCO  BLN  CAL  CLN  COA  DRT  PCK  UPG  WBL
0    1    0    0    0    0    0    1    0    0    0
1    1    0    0    0    1    0    1    0    0    0
2    1    0    0    0    1    0    1    0    0    0
3    0    1    1    1    1    0    1    1    1    1
4    1    0    1    1    1    0    1    0    1    1
5    0    0    1    1    1    1    1    1    1    1
6    0    0    1    1    1    1    1    1    1    1

Upvotes: 3

Related Questions