Reputation: 32
How can I filter or subset a particular group within a dataframe (e.g., admitted female from the dataframe below)? I am trying to sum up admissions/rejection rates based on gender. This dataframe is small, but what if it was much larger, let's say for example tens of thousands of line, where indexing individual values is impossible?
Admit Gender Dept Freq
0 Admitted Male A 512
1 Rejected Male A 313
2 Admitted Female A 89
3 Rejected Female A 19
4 Admitted Male B 353
5 Rejected Male B 207
6 Admitted Female B 17
7 Rejected Female B 8
8 Admitted Male C 120
9 Rejected Male C 205
10 Admitted Female C 202
11 Rejected Female C 391
12 Admitted Male D 138
13 Rejected Male D 279
14 Admitted Female D 131
15 Rejected Female D 244
16 Admitted Male E 53
17 Rejected Male E 138
18 Admitted Female E 94
19 Rejected Female E 299
20 Admitted Male F 22
21 Rejected Male F 351
22 Admitted Female F 24
23 Rejected Female F 317
Upvotes: 0
Views: 2107
Reputation: 21898
To filter the data you can use the very comprehensive query
function.
# Test data
df = DataFrame({'Admit': ['Admitted', 'Rejected', 'Admitted', 'Rejected', 'Admitted', 'Rejected', 'Admitted'],
'Gender': ['Male', 'Male', 'Female', 'Female', 'Male', 'Male', 'Female'],
'Freq': [512, 313, 89, 19, 353, 207, 17],
'Gender Dept': ['A', 'A', 'A', 'A', 'B', 'B', 'B']})
df.query('Admit == "Admitted" and Gender == "Female"')
Admit Freq Gender Gender Dept
2 Admitted 89 Female A
6 Admitted 17 Female B
To summarize data use groupby
.
group = df.groupby(['Admit', 'Gender']).sum()
print(group)
Freq
Admit Gender
Admitted Female 106
Male 865
Rejected Female 19
Male 520
You can the filter the result simply by subsetting on the created MultiIndex
.
group.loc[('Admitted', 'Female')]
Freq 106
Name: (Admitted, Female), dtype: int64
Upvotes: 2