Khalil Al Hooti
Khalil Al Hooti

Reputation: 4506

find count of groupby when other columns have certain values

I have a data-frame like below

    CHOICE 1            CHOICE 2        CHOICE 3            Gender
0   Chemistry           Earth Science   Applied Chemistry   M
1   Computer Science    Earth Science   Chemistry           M
2   Earth Science       Earth Science   Chemistry           M
3   Earth Science       Earth Science   Chemistry           F
4   Earth Science       Earth Science   Chemistry           F
5   Environmental       Biology         Earth Science       M
6   Geophysics          Earth Science   Computer Science    M
7   Earth Science       Earth Science   Earth Science       M
8   Earth Science       Earth Science   Earth Science       M
9   Earth Science       Earth Science   Earth Science       M
10  Earth Science       Earth Science   Earth Science       M
11  Earth Science       Earth Science   Earth Science       M
12  Earth Science       Earth Science   Earth Science       M
13  Earth Science       Earth Science   Earth Science       M
14  Earth Science       Earth Science   Earth Science       M
15  Biotechnology       Earth Science   Environmental       M
16  Chemistry           Earth Science   Environmental       M
17  Statistics          Earth Science   Environmental       M
18  Earth Science       Earth Science   Geophysics          F
19  Environmental       Biology         Earth Science       F
20  Geophysics          Earth Science   Geophysics          M
21  Chemistry           Earth Science   Mathematics         M
22  Chemistry           Earth Science   Mathematics         M
23  Statistics          Earth Science   Mathematics         F
24  Statistics          Earth Science   Mathematics         F
25  Geophysics          Earth Science   Physics             M
26  Chemistry           Earth Science   Statistics          M
27  Earth Science       Earth Science   Statistics          M
28  Earth Science       Earth Science   Statistics          M
29  Earth Science       Earth Science   Statistics          M
30  Geophysics          Earth Science   Statistics          F
31  Earth Science       Geophysics      Chemistry           M
32  Earth Science       Geophysics      Chemistry           M
33  Earth Science       Geophysics      Computer Science    M
34  Chemistry           Geophysics      Earth Science       M
35  Earth Science       Geophysics      Earth Science       M
36  Geophysics          Geophysics      Earth Science       F
37  Earth Science       Geophysics      Mathematics         F
38  Earth Science       Geophysics      Mathematics         M
39  Earth Science       Geophysics      Physics             M
40  Earth Science       Geophysics      Physics             M
41  Earth Science       Geophysics      Physics             M
42  Earth Science       Geophysics      Physics             M
43  Earth Science       Geophysics      Statistics          M
44  Earth Science       ENVR            BIOT                F
45  Earth Science       Geophysics      CHEM                M
46  Earth Science       Earth Science   Earth Science       M
47  Earth Science       Earth Science   Earth Science       M
48  Earth Science       Earth Science   Earth Science       F
49  Earth Science       Earth Science   Earth Science       F
50  Earth Science       Earth Science   Earth Science       F

I want to calculate counts per gender for each of the three other columns when their value is either Earth Science or Geophysics.

The final output should look like this.

Major          Gender
                         CHOICE 1     CHOICE 2      CHOICE 2
               M           count       count         count
Earth Science  
               F            count      count         count

               M            count      count         count
Geophysics
               F            count      count         count

Could someone please help me to achieve the desired output.

Thanks in advance

Upvotes: 1

Views: 50

Answers (1)

jezrael
jezrael

Reputation: 862581

First use DataFrame.melt, then filter by DataFrame.query and then reshape by GroupBy.size with Series.unstack:

df1 = (df.melt('Gender')
         .query('value in ["Earth Science", "Geophysics"]')
         .groupby(['value', 'Gender','variable'])
         .size()
         .unstack())
print (df1)
variable              CHOICE 1  CHOICE 2  CHOICE 3
value         Gender                              
Earth Science F              8         9         5
              M             25        25        13
Geophysics    F              2         2         1
              M              3        12         1

If want columns from MultiIndex:

df1 = (df.melt('Gender')
         .query('value in ["Earth Science", "Geophysics"]')
         .groupby(['value', 'Gender','variable'])
         .size()
         .unstack()
         .reset_index()
         .rename_axis(None, axis=1))
print (df1)
           value Gender  CHOICE 1  CHOICE 2  CHOICE 3
0  Earth Science      F         8         9         5
1  Earth Science      M        25        25        13
2     Geophysics      F         2         2         1
3     Geophysics      M         3        12         1

Another solution with DataFrame.pivot_table:

df1 = (df.melt('Gender')
         .query('value in ["Earth Science", "Geophysics"]')
         .pivot_table(index=['value', 'Gender'],columns='variable', aggfunc='size')
         .reset_index()
         .rename_axis(None, axis=1))
print (df1)
           value Gender  CHOICE 1  CHOICE 2  CHOICE 3
0  Earth Science      F         8         9         5
1  Earth Science      M        25        25        13
2     Geophysics      F         2         2         1
3     Geophysics      M         3        12         1

Upvotes: 3

Related Questions