Jeff
Jeff

Reputation: 4003

using pandas two perform Cross tab between one column and two others that share the same enumerated values

I have table with

|Majors|First Minor|Second Minor|
|Philosophy|English|Null|
|Math|Philosophy|English|
|English|Theology|Political Science
|Philosophy|Theology|English|

The possible values of the "First Minor" and "Second Minor" columns are the same (English, Philosophy, Math, etc)

I want to see the correlation between "Major" and the "Minors" belonging to those majors. I don't care if the minor is a first or second minor. So based on the above sample I want an output that looks something like:

|Major|Philosophy|Math|English
First Minor||||     
Philosophy|0|1|0|
|Math|0|0|0|
|English|2|1|0|
Political Science|0|0|1|
Theology|1|0|1|
Null|1|0|0

The following code generates the desired result for just the "First Minor" or the "Second Minor":

ct = pd.crosstab(newdf.MajorInterMinor, newdf["First Minor"])

But I want to get the results for both columns ("First Minor" and "Second Minor") and I can't figure out how to get the combined result.

Any help/advice is much appreciated!

Upvotes: 0

Views: 183

Answers (2)

wwnde
wwnde

Reputation: 26676

pd.melt to collapse FirstMinor, SecondMinor into one column and make new df

 df2 = pd.melt(df,id_vars=['Majors'], value_vars=['FirstMinor','SecondMinor'],var_name='myVarname', value_name='FirstMinor_SecondMinor')

Slice new df to get_dummies on Majors. Join this back to the df. Groupby and sum

pd.DataFrame(df2.loc[:,'FirstMinor_SecondMinor']).join(pd.get_dummies(df2.loc[:,'Majors'])).groupby('FirstMinor_SecondMinor').agg(sum)

   

                          English  Math  Philosophy
FirstMinor_SecondMinor                           
English                       0     1           2
Null                          0     0           1
Philosophy                    0     1           0
PoliticalScience              1     0           0
Theology                      1     0           1

Upvotes: 1

mozway
mozway

Reputation: 260790

melt the Minors, then apply crosstab:

df2 = df.melt(id_vars='Majors', value_name='Minors')

pd.crosstab(df2['Majors'], df2['Minors'])

Output:

Minors      English  Null  Philosophy  Political Science  Theology
Majors                                                            
English           0     0           0                  1         1
Math              1     0           1                  0         0
Philosophy        2     1           0                  0         1

Upvotes: 1

Related Questions