Reputation: 32226
names1880 = pd.read_csv('test.txt', names=['name', 'sex', 'births'])
names1880.groupby('sex').births.sum()
A sample file for the year 1880 (us baby names) is used. The expected result is something like this...
F 90993
M 110493
Name: births
But I am getting random total for each row.
0 58385
1 35818
2 33920
...
1896 57
1897 57
1898 57
How do I get correct results for male and female?
update: The following code seems to be working as expected. Does it mean that I have to break the object and can not use as a method?
mygroup=names1880.groupby('sex')
mygroup['births'].sum()
Here are the first 10 lines from test.txt file
Mary,F,7065
Anna,F,2604
Emma,F,2003
Elizabeth,F,1939
Minnie,F,1746
Margaret,F,1578
Ida,F,1472
Alice,F,1414
Bertha,F,1320
Sarah,F,1288
I am using pandas version 0.7.0 if that matters.
Upvotes: 0
Views: 1677
Reputation: 2564
What you wrote works well. When I copy the data sample you gave :
import pandas as pd
data = pd.read_clipboard(sep=',', header = None,
names = ['name', 'sex', 'births'])
data.groupby('sex').births.sum()
It prints :
sex
F 22429
You don't have to break anything, but you always can ! (maybe your issue is because of your Pandas version that is too old.)
When I add some men:
Mary,F,7065
Anna,F,2604
Emma,F,2003
Elizabeth,F,1939
Minnie,F,1746
Margaret,F,1578
Ida,F,1472
Jeremy,M,1477
Alice,F,1414
Bertha,F,1320
Sarah,F,1288
Jonathan,M,1255
Here what it prints, as expected :
sex
F 22429
M 2732
Name: births, dtype: int64
Upvotes: 1