Create Columns for Count for Each Variable Pandas

Question

After using value_counts and some other data cleaning, I have my data in the form:

year  city    category  count_per_city
2005  NYC     1         145
2007  ATL     1         75
2005  NYC     2         55
2006  LA      3         40

I'd like to convert it to this:

year  city  1    2   3   total 
2005  NYC   145  55  0   200
2006  LA    0    0   40  40
2007  ATL   75   0   0   75

I feel like there is a relatively simple way to do this that I'm missing.

tdy · Accepted Answer

You can use pivot_table() with margins and fill_value:

out = df.pivot_table(
    index=['year', 'city'],
    columns='category',
    aggfunc='sum',
    fill_value=0,
    margins=True,
    margins_name='total'
).drop('total')

#            count_per_city              
# category                1   2   3 total
# year  city                             
# 2005  NYC             145  55   0   200
# 2006  LA                0   0  40    40
# 2007  ATL              75   0   0    75

If you want the exact output from the OP, you can do some cleanup (thanks to @HenryEcker):

out.droplevel(0, axis=1).rename_axis(columns=None).reset_index()

#    year city    1   2   3  total
# 0  2005  NYC  145  55   0    200
# 1  2006   LA    0   0  40     40
# 2  2007  ATL   75   0   0     75

Create Columns for Count for Each Variable Pandas

Answers (2)

Related Questions