Martin
Martin

Reputation: 141

One hot encoding - encode multiple columns as one

I want to encode a dataframe that has multiple columns of the same "type", for example:

import pandas as pd

df = pd.DataFrame(data=[["France", "Bupapest", "Sweden", "Paris"], ["Italy", "Frankfurt", "France", "Naples"]], columns=["Countries 1", "Cities 1", "Countries 2", "Cities 2"])
print(df)

Output:

  Countries 1   Cities 1 Countries 2 Cities 2
0      France   Bupapest      Sweden    Paris
1       Italy  Frankfurt      France   Naples

How do I encode this dataframe with one hot encoding by passing in column indices which should be considered as one? In this example, I would pass in [0, 2] and [1, 3] because Countries 1 and Countries 2 column has 3 different countries combined and therefore should have 3 categories, not 2 each and the same principle goes for the two countries columns.

Upvotes: 3

Views: 979

Answers (1)

BENY
BENY

Reputation: 323226

I am using wide_to_long flatten the df , then using factorize+unstack

s=pd.wide_to_long(df.reset_index(),stubnames=['Countries','Cities'],i='index',j='unstack',sep=' ').apply(lambda x : pd.factorize(x)[0]+1).unstack()

s.columns=s.columns.map('{0[0]} {0[1]}'.format)

s=s.reindex(columns=df.columns)
s
Out[1377]: 
       Countries 1  Cities 1  Countries 2  Cities 2
index                                              
0                1         1            3         3
1                2         2            1         4

Or get_dummies

s=pd.get_dummies(pd.wide_to_long(df.reset_index(),stubnames=['Countries','Cities'],i='index',j='unstack',sep=' '))

s
Out[1392]: 
               Countries_France  Countries_Italy  Countries_Sweden  \
index unstack                                                        
0     1                       1                0                 0   
1     1                       0                1                 0   
0     2                       0                0                 1   
1     2                       1                0                 0   
               Cities_Bupapest  Cities_Frankfurt  Cities_Naples  Cities_Paris  
index unstack                                                                  
0     1                      1                 0              0             0  
1     1                      0                 1              0             0  
0     2                      0                 0              0             1  
1     2                      0                 0              1             0  

Upvotes: 2

Related Questions