Akshit
Akshit

Reputation: 11

Pandas counting number of rows based on data of two columns

I am working on a dataset with format similar to this :-

Name      Sex     Survived      random_cols . . . .

Akshit    Male        1           rand_val   .......

Hema      Female      0           .................

Rekha      Female     1           .................
.
.
.

I want to count the number of Male and Female who Survived i.e have value 1 for the Survived column. I can do this easily with a naive approach of using counter but I was wondering if there is a way to do this in more efficient way with few lesser lines of code using pandas

m = 0
f = 0
for i in range(len(train_data['Sex'])):
    if train_data['Sex'][i] == 'male' and train_data['Survived'][i] == 1:
        m = m + 1
    
    if train_data['Sex'][i] == 'female' and train_data['Survived'][i] == 1:
        f = f + 1

print(m)
print(f)

Upvotes: 0

Views: 1339

Answers (2)

Ynjxsjmh
Ynjxsjmh

Reputation: 30002

You can use boolean indexing to filter by the Survived column to get only survived rows then value_counts on Sex column:

s = df[df['Survived'].eq(1)].value_counts(subset=['Sex'])
print(s)

Sex
Female    1
Male      1
dtype: int64

The return value is a pandas Series, you can access its value with

s['Male']
s['Female']

Upvotes: 1

Vishnudev Krishnadas
Vishnudev Krishnadas

Reputation: 10960

Use pandas.DataFrame.value_counts

train_data.value_counts(subset=['Sex', 'Survived'])

Upvotes: 0

Related Questions