Reputation: 4506
I have a data-frame like below
CHOICE 1 CHOICE 2 CHOICE 3 Gender
0 Chemistry Earth Science Applied Chemistry M
1 Computer Science Earth Science Chemistry M
2 Earth Science Earth Science Chemistry M
3 Earth Science Earth Science Chemistry F
4 Earth Science Earth Science Chemistry F
5 Environmental Biology Earth Science M
6 Geophysics Earth Science Computer Science M
7 Earth Science Earth Science Earth Science M
8 Earth Science Earth Science Earth Science M
9 Earth Science Earth Science Earth Science M
10 Earth Science Earth Science Earth Science M
11 Earth Science Earth Science Earth Science M
12 Earth Science Earth Science Earth Science M
13 Earth Science Earth Science Earth Science M
14 Earth Science Earth Science Earth Science M
15 Biotechnology Earth Science Environmental M
16 Chemistry Earth Science Environmental M
17 Statistics Earth Science Environmental M
18 Earth Science Earth Science Geophysics F
19 Environmental Biology Earth Science F
20 Geophysics Earth Science Geophysics M
21 Chemistry Earth Science Mathematics M
22 Chemistry Earth Science Mathematics M
23 Statistics Earth Science Mathematics F
24 Statistics Earth Science Mathematics F
25 Geophysics Earth Science Physics M
26 Chemistry Earth Science Statistics M
27 Earth Science Earth Science Statistics M
28 Earth Science Earth Science Statistics M
29 Earth Science Earth Science Statistics M
30 Geophysics Earth Science Statistics F
31 Earth Science Geophysics Chemistry M
32 Earth Science Geophysics Chemistry M
33 Earth Science Geophysics Computer Science M
34 Chemistry Geophysics Earth Science M
35 Earth Science Geophysics Earth Science M
36 Geophysics Geophysics Earth Science F
37 Earth Science Geophysics Mathematics F
38 Earth Science Geophysics Mathematics M
39 Earth Science Geophysics Physics M
40 Earth Science Geophysics Physics M
41 Earth Science Geophysics Physics M
42 Earth Science Geophysics Physics M
43 Earth Science Geophysics Statistics M
44 Earth Science ENVR BIOT F
45 Earth Science Geophysics CHEM M
46 Earth Science Earth Science Earth Science M
47 Earth Science Earth Science Earth Science M
48 Earth Science Earth Science Earth Science F
49 Earth Science Earth Science Earth Science F
50 Earth Science Earth Science Earth Science F
I want to calculate counts per gender for each of the three other columns when their value is either Earth Science or Geophysics.
The final output should look like this.
Major Gender
CHOICE 1 CHOICE 2 CHOICE 2
M count count count
Earth Science
F count count count
M count count count
Geophysics
F count count count
Could someone please help me to achieve the desired output.
Thanks in advance
Upvotes: 1
Views: 50
Reputation: 862581
First use DataFrame.melt
, then filter by DataFrame.query
and then reshape by GroupBy.size
with Series.unstack
:
df1 = (df.melt('Gender')
.query('value in ["Earth Science", "Geophysics"]')
.groupby(['value', 'Gender','variable'])
.size()
.unstack())
print (df1)
variable CHOICE 1 CHOICE 2 CHOICE 3
value Gender
Earth Science F 8 9 5
M 25 25 13
Geophysics F 2 2 1
M 3 12 1
If want columns from MultiIndex
:
df1 = (df.melt('Gender')
.query('value in ["Earth Science", "Geophysics"]')
.groupby(['value', 'Gender','variable'])
.size()
.unstack()
.reset_index()
.rename_axis(None, axis=1))
print (df1)
value Gender CHOICE 1 CHOICE 2 CHOICE 3
0 Earth Science F 8 9 5
1 Earth Science M 25 25 13
2 Geophysics F 2 2 1
3 Geophysics M 3 12 1
Another solution with DataFrame.pivot_table
:
df1 = (df.melt('Gender')
.query('value in ["Earth Science", "Geophysics"]')
.pivot_table(index=['value', 'Gender'],columns='variable', aggfunc='size')
.reset_index()
.rename_axis(None, axis=1))
print (df1)
value Gender CHOICE 1 CHOICE 2 CHOICE 3
0 Earth Science F 8 9 5
1 Earth Science M 25 25 13
2 Geophysics F 2 2 1
3 Geophysics M 3 12 1
Upvotes: 3