Reputation: 33
I created a dataframe that counts the number of unique states with at least 1 non null value and the total number of non null values grouped by year. But I'm having trouble adding an additional column that divides the total number of non null values and count of unique states.
Here is some sample data:
nan = float('nan')
df = pd.DataFrame({
'year': [2018, 2018, 2018, 2018, 2018, 2019, 2019, 2019, 2019, 2019],
'state': [1, 2, 3, 1, 2, 1, 2, 3, 4, 5],
'var1': [nan, 1.0, nan, 1.0, nan, nan, 1.0, nan, 2.0, 2.0],
'var2': [2.0, 1.0, nan, 2.0, 1.0, nan, 1.0, nan, 1.0, nan],
})
And this is my code so far:
import numpy as np
c = df.groupby(['year', 'state']).count()
res = c.groupby('year').agg([np.count_nonzero, sum])
My output currently looks like this:
var1 var2
count_nonzero sum count_nonzero sum
year
2018 2 2 2 4
2019 3 3 2 2
This is what I want it to look like:
var1 var2
count_nonzero sum average count_nonzero sum average
year
2018 2 2 1 2 4 2
2019 3 3 1 2 2 1
How can I modify the code above to do this?
Upvotes: 0
Views: 57
Reputation: 30022
Let's try
out = (res.join(res.xs('sum', level=1, axis=1, drop_level=False)
.div(res.xs('count_nonzero', level=1, axis=1, drop_level=False).values)
.rename(columns={'sum':'average'}))
.sort_index(axis=1))
print(out)
var1 var2
average count_nonzero sum average count_nonzero sum
year
2018 1.0 2 2 2.0 2 4
2019 1.0 3 3 1.0 2 2
Upvotes: 1