a.awaji
a.awaji

Reputation: 32

Dataframe filtering in pandas

How can I filter or subset a particular group within a dataframe (e.g., admitted female from the dataframe below)? I am trying to sum up admissions/rejection rates based on gender. This dataframe is small, but what if it was much larger, let's say for example tens of thousands of line, where indexing individual values is impossible?

      Admit  Gender Dept  Freq
0   Admitted    Male    A   512
1   Rejected    Male    A   313
2   Admitted  Female    A    89
3   Rejected  Female    A    19
4   Admitted    Male    B   353
5   Rejected    Male    B   207
6   Admitted  Female    B    17
7   Rejected  Female    B     8
8   Admitted    Male    C   120
9   Rejected    Male    C   205
10  Admitted  Female    C   202
11  Rejected  Female    C   391
12  Admitted    Male    D   138
13  Rejected    Male    D   279
14  Admitted  Female    D   131
15  Rejected  Female    D   244
16  Admitted    Male    E    53
17  Rejected    Male    E   138
18  Admitted  Female    E    94
19  Rejected  Female    E   299
20  Admitted    Male    F    22
21  Rejected    Male    F   351
22  Admitted  Female    F    24
23  Rejected  Female    F   317

Upvotes: 0

Views: 2107

Answers (1)

Romain
Romain

Reputation: 21898

To filter the data you can use the very comprehensive queryfunction.

# Test data
df = DataFrame({'Admit': ['Admitted', 'Rejected', 'Admitted', 'Rejected', 'Admitted', 'Rejected', 'Admitted'],
        'Gender': ['Male', 'Male', 'Female', 'Female', 'Male', 'Male', 'Female'],
        'Freq': [512, 313, 89, 19, 353, 207, 17],
        'Gender Dept': ['A', 'A', 'A', 'A', 'B', 'B', 'B']})

df.query('Admit == "Admitted" and Gender == "Female"')

      Admit  Freq  Gender Gender Dept
2  Admitted    89  Female           A
6  Admitted    17  Female           B

To summarize data use groupby.

group = df.groupby(['Admit', 'Gender']).sum()
print(group)

                 Freq
Admit    Gender      
Admitted Female   106
         Male     865
Rejected Female    19
         Male     520

You can the filter the result simply by subsetting on the created MultiIndex.

group.loc[('Admitted', 'Female')]

Freq    106
Name: (Admitted, Female), dtype: int64

Upvotes: 2

Related Questions