Reputation: 23
I'm trying to figure out how to count a number of occurrences in the DataFrame using multiple criteria. In this particular example, I'd like to know the number of female passengers in Pclass 3.
PassengerId Pclass Sex Age SibSp Parch Ticket Fare Cabin Embarked
0 892 3 male 34.5 0 0 330911 7.8292 NaN Q
1 893 3 female 47.0 1 0 363272 7.0000 NaN S
2 894 2 male 62.0 0 0 240276 9.6875 NaN Q
3 895 3 male 27.0 0 0 315154 8.6625 NaN S
4 896 3 female 22.0 1 1 3101298 12.2875 NaN S
Here's my few failed attempts:
len(test[test["Sex"] == "female", test["Pclass"] == 3])
sum(test.Pclass == 3 & test.Sex == "female")
test.[test["Sex"] == "female", test["Pclass"] == 3].count()
None of them seem to be working. At the end I've created my own function, but there must be a simpler way to calculate that.
def countif(sex, pclass):
x = 0
for i in range(0,len(test)):
s = test.iloc[i]['Sex']
c = test.iloc[i]['Pclass']
if s == sex and c == pclass:
x = x + 1
return x
Thank you in advance
Upvotes: 2
Views: 1293
Reputation: 47008
There are a few ways to do this:
test = pd.DataFrame({'PassengerId': {0: 892, 1: 893, 2: 894, 3: 895, 4: 896},
'Pclass': {0: 3, 1: 3, 2: 2, 3: 3, 4: 3},
'Sex': {0: 'male', 1: 'female', 2: 'male', 3: 'male', 4: 'female'},
'Age': {0: 34.5, 1: 47.0, 2: 62.0, 3: 27.0, 4: 22.0},
'SibSp': {0: 0, 1: 1, 2: 0, 3: 0, 4: 1},
'Parch': {0: 0, 1: 0, 2: 0, 3: 0, 4: 1},
'Ticket': {0: 330911, 1: 363272, 2: 240276, 3: 315154, 4: 3101298},
'Fare': {0: 7.8292, 1: 7.0, 2: 9.6875, 3: 8.6625, 4: 12.2875},
'Cabin': {0: np.nan, 1: np.nan, 2: np.nan, 3: np.nan, 4: np.nan},
'Embarked': {0: 'Q', 1: 'S', 2: 'Q', 3: 'S', 4: 'S'}})
You need to put the boolean in round brackets and join with an &
sum((test.Pclass == 3) & (test.Sex == "female"))
len(test[(test.Pclass == 3) & (test.Sex == "female")])
test[(test["Sex"] == "female") & (test["Pclass"] == 3)].shape[0]
Or you can do:
tab = pd.crosstab(df.Pclass,df.Sex)
Sex female male
Pclass
2 0 1
3 2 2
tab.iloc[tab.index==3]['female']
Upvotes: 1