shantanuo
shantanuo

Reputation: 32226

Wrong results for group by in pandas

names1880 = pd.read_csv('test.txt', names=['name', 'sex', 'births'])
names1880.groupby('sex').births.sum()

A sample file for the year 1880 (us baby names) is used. The expected result is something like this...

F 90993
M 110493
Name: births

But I am getting random total for each row.

0     58385
1     35818
2     33920
...
1896    57
1897    57
1898    57

How do I get correct results for male and female?

update: The following code seems to be working as expected. Does it mean that I have to break the object and can not use as a method?

mygroup=names1880.groupby('sex')
mygroup['births'].sum()

Here are the first 10 lines from test.txt file

Mary,F,7065
Anna,F,2604
Emma,F,2003
Elizabeth,F,1939
Minnie,F,1746
Margaret,F,1578
Ida,F,1472
Alice,F,1414
Bertha,F,1320
Sarah,F,1288

I am using pandas version 0.7.0 if that matters.

Upvotes: 0

Views: 1677

Answers (1)

DavidK
DavidK

Reputation: 2564

What you wrote works well. When I copy the data sample you gave :

import pandas as pd 
data = pd.read_clipboard(sep=',', header = None, 
                             names = ['name', 'sex', 'births'])

data.groupby('sex').births.sum()

It prints :

sex
F      22429

You don't have to break anything, but you always can ! (maybe your issue is because of your Pandas version that is too old.)

When I add some men:

Mary,F,7065
Anna,F,2604
Emma,F,2003
Elizabeth,F,1939
Minnie,F,1746
Margaret,F,1578
Ida,F,1472
Jeremy,M,1477
Alice,F,1414
Bertha,F,1320
Sarah,F,1288
Jonathan,M,1255

Here what it prints, as expected :

sex
F      22429
M       2732
Name: births, dtype: int64

Upvotes: 1

Related Questions