s_khan92
s_khan92

Reputation: 979

How to create correlation based on different columns in pandas?

I have data like this:

Users_id        My_Fav         Bro_Fav          Friend_Fav
User0             BMW            VW                BMW
UserA             VW             Mercedes          Honda
UserB             Honda          Honda             VW
UserC             Mercedes       BMW               Mercedes
UserD             VW             BMW               BMW

I would like output for correlation between Columns and Brands and desired output would be like this:

                      My_Fav         Bro_Fav          Friend_Fav
BMW                     1               2                2
VW                      2               1                1
Honda                   1               1                1
Mercedes                1               1                1

Upvotes: 1

Views: 42

Answers (2)

jezrael
jezrael

Reputation: 863501

You can count columns values per columns and then sum per index values, if necessary convert Users_id column to index in first step:

#Users_id is column
df = df.set_index('Users_id').apply(pd.value_counts).sum(level=0)

#Users_id is index
#df = df.apply(pd.value_counts).sum(level=0)
print (df)
          My_Fav  Bro_Fav  Friend_Fav
BMW            1        2           2
Honda          1        1           1
Mercedes       1        1           1
VW             2        1           1

Upvotes: 2

BENY
BENY

Reputation: 323366

IIUC melt + crosstab

s=df.melt('Users_id')
s=pd.crosstab(s.value,s.variable)
variable  Bro_Fav  Friend_Fav  My_Fav
value                                
BMW             2           2       1
Honda           1           1       1
Mercedes        1           1       1
VW              1           1       2

Upvotes: 1

Related Questions