How to divide columns to create new column

Question

I created a dataframe that counts the number of unique states with at least 1 non null value and the total number of non null values grouped by year. But I'm having trouble adding an additional column that divides the total number of non null values and count of unique states.

Here is some sample data:

nan = float('nan')
df = pd.DataFrame({
    'year': [2018, 2018, 2018, 2018, 2018, 2019, 2019, 2019, 2019, 2019],
    'state': [1, 2, 3, 1, 2, 1, 2, 3, 4, 5],
    'var1': [nan, 1.0, nan, 1.0, nan, nan, 1.0, nan, 2.0, 2.0],
    'var2': [2.0, 1.0, nan, 2.0, 1.0, nan, 1.0, nan, 1.0, nan],
})

And this is my code so far:

import numpy as np

c = df.groupby(['year', 'state']).count()
res = c.groupby('year').agg([np.count_nonzero, sum])

My output currently looks like this:

        var1                    var2
        count_nonzero   sum     count_nonzero   sum
year                
2018    2               2       2               4
2019    3               3       2               2

This is what I want it to look like:

        var1                                    var2
        count_nonzero   sum     average         count_nonzero   sum    average
year                
2018    2               2       1               2               4      2
2019    3               3       1               2               2      1

How can I modify the code above to do this?

Ynjxsjmh · Accepted Answer

Let's try

out = (res.join(res.xs('sum', level=1, axis=1, drop_level=False)
                .div(res.xs('count_nonzero', level=1, axis=1, drop_level=False).values)
                .rename(columns={'sum':'average'}))
       .sort_index(axis=1))

print(out)

        var1                      var2
     average count_nonzero sum average count_nonzero sum
year
2018     1.0             2   2     2.0             2   4
2019     1.0             3   3     1.0             2   2

How to divide columns to create new column

Answers (1)

Related Questions