Reputation: 5681
I am learning pandas and I want to create a new colum to my data ( I am using the national names data ).
I am just using 1880 and 1881 years.
name sex births year
0 Mary F 7065 1880
1 Anna F 2604 1880
2 Emma F 2003 1880
3 Elizabeth F 1939 1880
4 Worthy M 5 1880
5 Wright M 5 1880
6 York M 5 1880
7 Zachariah M 5 1880
8 Mary F 6919 1881
9 Anna F 2698 1881
10 Emma F 2034 1881
11 Elizabeth F 1852 1881
12 Wilton M 5 1881
13 Wing M 5 1881
14 Wood M 5 1881
15 Wright M 5 1881
I am creating the total births data:
total_births = names.pivot_table('births', index='year', columns='sex', aggfunc=sum)
which gives:
sex F M
year
1880 13611 20
1881 13503 20
Now, I want to create another column in the data where I will put the ratio of births per year to the total births per year.
For example:
name sex births year ratio
Mary F 7065 1880 7065/13611
Wilton M 5 1881 5/13503
I am trying:
new = (names.groupby(['year', 'sex'])).assign(ratio= (names.groupby(['year','sex'])).names['births'] / total_births )
which gives:
AttributeError: Cannot access callable attribute 'assign' of 'DataFrameGroupBy' objects, try using the 'apply' method
OR
I tried to break :
ratio = names.groupby(['year','sex'])
ratio1 = ratio.loc[:,'births']
but it gives:
AttributeError: Cannot access callable attribute 'loc' of 'DataFrameGroupBy' objects, try using the 'apply' method
Upvotes: 2
Views: 3120
Reputation: 863196
I think you need groupby
with transform
sum
and then divide by div
:
rat = names.groupby(['year','sex'])['births'].transform('sum')
print (rat)
0 13611
1 13611
2 13611
3 13611
4 20
5 20
6 20
7 20
8 13503
9 13503
10 13503
11 13503
12 20
13 20
14 20
15 20
Name: births, dtype: int64
names['ratio'] = names.births.div(rat)
print (names)
name sex births year ratio
0 Mary F 7065 1880 0.519065
1 Anna F 2604 1880 0.191316
2 Emma F 2003 1880 0.147160
3 Elizabeth F 1939 1880 0.142458
4 Worthy M 5 1880 0.250000
5 Wright M 5 1880 0.250000
6 York M 5 1880 0.250000
7 Zachariah M 5 1880 0.250000
8 Mary F 6919 1881 0.512405
9 Anna F 2698 1881 0.199807
10 Emma F 2034 1881 0.150633
11 Elizabeth F 1852 1881 0.137155
12 Wilton M 5 1881 0.250000
13 Wing M 5 1881 0.250000
14 Wood M 5 1881 0.250000
15 Wright M 5 1881 0.250000
Solution with assign
:
names = names.assign(ratio=lambda x: x.births.div(rat))
print (names)
name sex births year ratio
0 Mary F 7065 1880 0.519065
1 Anna F 2604 1880 0.191316
2 Emma F 2003 1880 0.147160
3 Elizabeth F 1939 1880 0.142458
4 Worthy M 5 1880 0.250000
5 Wright M 5 1880 0.250000
6 York M 5 1880 0.250000
7 Zachariah M 5 1880 0.250000
8 Mary F 6919 1881 0.512405
9 Anna F 2698 1881 0.199807
10 Emma F 2034 1881 0.150633
11 Elizabeth F 1852 1881 0.137155
12 Wilton M 5 1881 0.250000
13 Wing M 5 1881 0.250000
14 Wood M 5 1881 0.250000
15 Wright M 5 1881 0.250000
Upvotes: 4