Graham Streich
Graham Streich

Reputation: 924

Create new categorical variable based on multiple binary columns

I have a data frame with many binary variables and I would like to create a new variable with categorical values based on many of these binary variables

My dataframe looks like this

gov_winner    corp_winner    in part
        1              0           0
        0              1           0
        0              0           1

I variable I would like to create is called winning_party and would look like this

gov_winner    corp_winner    in part    winning_party
        1              0           0             gov
        0              1           0            corp
        0              0           1         in part

I started trying the following code but haven't had success yet:

 harrington_citations = harrington_citations.assign(winning_party=lambda x: x['gov_winner'] 
 == 1 then x = 'gov' else x == 0)

Using anky_91's answer I get the following error:

TypeError: can't multiply sequence by non-int of type 'str'

Upvotes: 1

Views: 1261

Answers (3)

jezrael
jezrael

Reputation: 863611

If there is always only one 1 per rows use DataFrame.dot, also you can filter only 1 and 0 columns before:

df1 = df.loc[:, df.isin([0,1,'0','1']).all()].astype(int)
df['Winner_Party'] = df1.dot(df1.columns)

But if there is multiple 1 per rows and need all matched values add separator and then remove it :

df['Winner_Party'] = df1.dot(df1.columns + ',').str.rstrip(',')

print (df)
   gov_winner  corp_winner  in part Winner_Party
0           1            0        0   gov_winner
1           0            1        0  corp_winner
2           0            0        1      in part

Upvotes: 1

BENY
BENY

Reputation: 323376

How about idxmax, notice this will only select the first max , you have multiple cell equal to 1 per row, you may want to try Jez's solution

df['Winner_Party']=df.eq(1).idxmax(1)

Upvotes: 3

anky
anky

Reputation: 75150

You can use a dot product:

df.assign(Winner_Party=df.dot(df.columns))
#df.assign(Winner_Party=df @ df.columns)

   gov_winner  corp_winner  in_part Winner_Party
0           1            0        0   gov_winner
1           0            1        0  corp_winner
2           0            0        1      in_part

Upvotes: 3

Related Questions