Reputation: 924
I have a data frame with many binary variables and I would like to create a new variable with categorical values based on many of these binary variables
My dataframe looks like this
gov_winner corp_winner in part
1 0 0
0 1 0
0 0 1
I variable I would like to create is called winning_party
and would look like this
gov_winner corp_winner in part winning_party
1 0 0 gov
0 1 0 corp
0 0 1 in part
I started trying the following code but haven't had success yet:
harrington_citations = harrington_citations.assign(winning_party=lambda x: x['gov_winner']
== 1 then x = 'gov' else x == 0)
Using anky_91's answer I get the following error:
TypeError: can't multiply sequence by non-int of type 'str'
Upvotes: 1
Views: 1261
Reputation: 863611
If there is always only one 1
per rows use DataFrame.dot
, also you can filter only 1
and 0
columns before:
df1 = df.loc[:, df.isin([0,1,'0','1']).all()].astype(int)
df['Winner_Party'] = df1.dot(df1.columns)
But if there is multiple 1
per rows and need all matched values add separator and then remove it :
df['Winner_Party'] = df1.dot(df1.columns + ',').str.rstrip(',')
print (df)
gov_winner corp_winner in part Winner_Party
0 1 0 0 gov_winner
1 0 1 0 corp_winner
2 0 0 1 in part
Upvotes: 1
Reputation: 323376
How about idxmax
, notice this will only select the first max , you have multiple cell equal to 1 per row, you may want to try Jez's solution
df['Winner_Party']=df.eq(1).idxmax(1)
Upvotes: 3
Reputation: 75150
You can use a dot product:
df.assign(Winner_Party=df.dot(df.columns))
#df.assign(Winner_Party=df @ df.columns)
gov_winner corp_winner in_part Winner_Party
0 1 0 0 gov_winner
1 0 1 0 corp_winner
2 0 0 1 in_part
Upvotes: 3