Calculations on a pandas DataFrame column conditional on another column

Question

I notice several 'set value of new column based on value of another'-type questions, but from what I gather, I have not found that they address dividing values in the same column, based on the conditions set by another column.

The data I have is as the table below, minus the column (variable) 'healthpertotal'.

It shows (in the column 'function'), the amount of government spending (aka expenditure) on
a) health (column 'value'), and
b) its total spending (same column 'value'), and
the associated year of that spending (column 'year').

I want to make a new column that shows the percent of government health spending over its total spending, for a given year, as shown below in the column 'healthpertotal'.

So for instance, in 1995, the value of this variable is (42587(health spending amount)/326420(total spending amount))*100=13.05.

As for the rows showing total spending, the 'healthpertotal' could be 'missing', 1, or 'not applicable' and the like. I am ok with any of these options.

How would I set up this new column 'healthpertotal' using python?

A proposed table or DataFrame for what I would like to achieve follows (and its code on how it might be set up - artificially 'forced' in the case of the final variable 'healthpertotal') :

data = {'function':['Health'] * 3 + ['Total'] * 3,
        'year':[1995,1996,1997,1995,1996,1997],
        'value':[42587, 44209,44472,326420,333637,340252],
        'healthpertotal':[13.05,13.25,13.07]+[np.nan]*3
        }

df = pd.DataFrame(data)

print (df)

Expected outcome:

  function  year   value  healthpertotal
0   Health  1995   42587           13.05
1   Health  1996   44209           13.25
2   Health  1997   44472           13.07
3    Total  1995  326420             NaN
4    Total  1996  333637             NaN
5    Total  1997  340252             NaN

Calculations on a pandas DataFrame column conditional on another column

Answers (1)

Related Questions