George
George

Reputation: 5681

add new column and calculate ratio

I am learning pandas and I want to create a new colum to my data ( I am using the national names data ).

I am just using 1880 and 1881 years.

          name sex  births  year
0        Mary   F    7065  1880
1        Anna   F    2604  1880
2        Emma   F    2003  1880
3   Elizabeth   F    1939  1880
4      Worthy   M       5  1880
5      Wright   M       5  1880
6        York   M       5  1880
7   Zachariah   M       5  1880
8        Mary   F    6919  1881
9        Anna   F    2698  1881
10       Emma   F    2034  1881
11  Elizabeth   F    1852  1881
12     Wilton   M       5  1881
13       Wing   M       5  1881
14       Wood   M       5  1881
15     Wright   M       5  1881

I am creating the total births data:

total_births = names.pivot_table('births', index='year', columns='sex', aggfunc=sum)

which gives:

sex       F   M
year           
1880  13611  20
1881  13503  20

Now, I want to create another column in the data where I will put the ratio of births per year to the total births per year.

For example:

name  sex births     year ratio
Mary   F   7065      1880  7065/13611
Wilton M     5       1881   5/13503

I am trying:

new = (names.groupby(['year', 'sex'])).assign(ratio= (names.groupby(['year','sex'])).names['births'] / total_births )

which gives:

AttributeError: Cannot access callable attribute 'assign' of 'DataFrameGroupBy' objects, try using the 'apply' method

OR

I tried to break :

ratio = names.groupby(['year','sex'])
ratio1 = ratio.loc[:,'births']

but it gives:

AttributeError: Cannot access callable attribute 'loc' of 'DataFrameGroupBy' objects, try using the 'apply' method

Upvotes: 2

Views: 3120

Answers (1)

jezrael
jezrael

Reputation: 863196

I think you need groupby with transform sum and then divide by div:

rat = names.groupby(['year','sex'])['births'].transform('sum')
print (rat)
0     13611
1     13611
2     13611
3     13611
4        20
5        20
6        20
7        20
8     13503
9     13503
10    13503
11    13503
12       20
13       20
14       20
15       20
Name: births, dtype: int64
names['ratio'] = names.births.div(rat)
print (names)
         name sex  births  year     ratio
0        Mary   F    7065  1880  0.519065
1        Anna   F    2604  1880  0.191316
2        Emma   F    2003  1880  0.147160
3   Elizabeth   F    1939  1880  0.142458
4      Worthy   M       5  1880  0.250000
5      Wright   M       5  1880  0.250000
6        York   M       5  1880  0.250000
7   Zachariah   M       5  1880  0.250000
8        Mary   F    6919  1881  0.512405
9        Anna   F    2698  1881  0.199807
10       Emma   F    2034  1881  0.150633
11  Elizabeth   F    1852  1881  0.137155
12     Wilton   M       5  1881  0.250000
13       Wing   M       5  1881  0.250000
14       Wood   M       5  1881  0.250000
15     Wright   M       5  1881  0.250000

Solution with assign:

names = names.assign(ratio=lambda x: x.births.div(rat))
print (names)
         name sex  births  year     ratio
0        Mary   F    7065  1880  0.519065
1        Anna   F    2604  1880  0.191316
2        Emma   F    2003  1880  0.147160
3   Elizabeth   F    1939  1880  0.142458
4      Worthy   M       5  1880  0.250000
5      Wright   M       5  1880  0.250000
6        York   M       5  1880  0.250000
7   Zachariah   M       5  1880  0.250000
8        Mary   F    6919  1881  0.512405
9        Anna   F    2698  1881  0.199807
10       Emma   F    2034  1881  0.150633
11  Elizabeth   F    1852  1881  0.137155
12     Wilton   M       5  1881  0.250000
13       Wing   M       5  1881  0.250000
14       Wood   M       5  1881  0.250000
15     Wright   M       5  1881  0.250000

Upvotes: 4

Related Questions