Reputation: 13
I am having trouble to create dummy variables from a dataset like this one:
x = pd.DataFrame({'Temp':['Hot','Cold','Warm','Cold'],'Temp_2':[np.nan,'Warm','Cold',np.nan]
Note that the values are the same in both variables (Hot, Cold or Warm).
Temp Temp_2
0 Hot NaN
1 Cold Warm
2 Warm Cold
3 Cold NaN
So my problem is when using pd.get_dummies, the function does not take into consideration this relationship and codifies both variables independently.
Temp_Cold Temp_Hot Temp_Warm Temp_2_Cold Temp_2_Warm
0 0 1 0 0 0
1 1 0 0 0 1
2 0 0 1 1 0
3 1 0 0 0 0
Is there a way I can codify it so i can get it like this?
Cold Hot Warm
0 0 1 0
1 1 0 1
2 1 0 1
3 1 0 0
Thanks,
Upvotes: 0
Views: 861
Reputation: 474
You can do something like this:
x = pd.DataFrame({'Temp':['Hot','Cold','Warm','Cold'],'Temp_2':[np.nan,'Warm','Cold',np.nan]})
print(x)
a = pd.get_dummies(x, prefix=['',''])
b = a.groupby(lambda x:x, axis=1).sum()
print(b)
It is not so clean but works. Using prefix allows to have the same name in the columns generated from temp and temp_2.
_Cold _Hot _Warm
0 0 1 0
1 1 0 1
2 1 0 1
3 1 0 0
Upvotes: 1