Bernat
Bernat

Reputation: 13

Create Dummy variables from multiple variables in python

I am having trouble to create dummy variables from a dataset like this one:

x = pd.DataFrame({'Temp':['Hot','Cold','Warm','Cold'],'Temp_2':[np.nan,'Warm','Cold',np.nan]

Note that the values are the same in both variables (Hot, Cold or Warm).

    Temp    Temp_2
0   Hot     NaN
1   Cold    Warm
2   Warm    Cold
3   Cold    NaN

So my problem is when using pd.get_dummies, the function does not take into consideration this relationship and codifies both variables independently.

    Temp_Cold   Temp_Hot    Temp_Warm   Temp_2_Cold      Temp_2_Warm
0       0           1           0              0               0
1       1           0           0              0               1
2       0           0           1              1               0
3       1           0           0              0               0

Is there a way I can codify it so i can get it like this?

    Cold    Hot Warm
0     0      1    0
1     1      0    1
2     1      0    1
3     1      0    0

Thanks,

Upvotes: 0

Views: 861

Answers (1)

aurelien_morel
aurelien_morel

Reputation: 474

You can do something like this:

x = pd.DataFrame({'Temp':['Hot','Cold','Warm','Cold'],'Temp_2':[np.nan,'Warm','Cold',np.nan]})
print(x)
a = pd.get_dummies(x, prefix=['',''])
b = a.groupby(lambda x:x, axis=1).sum()
print(b)

It is not so clean but works. Using prefix allows to have the same name in the columns generated from temp and temp_2.

   _Cold  _Hot  _Warm
0      0     1      0
1      1     0      1
2      1     0      1
3      1     0      0

Upvotes: 1

Related Questions