Ethan
Ethan

Reputation: 11

Create Columns for Count for Each Variable Pandas

After using value_counts and some other data cleaning, I have my data in the form:

year  city    category  count_per_city
2005  NYC     1         145
2007  ATL     1         75
2005  NYC     2         55
2006  LA      3         40

I'd like to convert it to this:

year  city  1    2   3   total 
2005  NYC   145  55  0   200
2006  LA    0    0   40  40
2007  ATL   75   0   0   75

I feel like there is a relatively simple way to do this that I'm missing.

Upvotes: 0

Views: 50

Answers (2)

Allen Qin
Allen Qin

Reputation: 19947

Another solution using unstack:

(
    df.set_index(['year', 'city', 'category']).unstack(2)
    .droplevel(0, axis=1)
    .assign(Total =lambda x: x.fillna(0).apply(sum, axis=1))
    .reset_index()
    .rename_axis(columns='')
)

Upvotes: 0

tdy
tdy

Reputation: 41327

You can use pivot_table() with margins and fill_value:

out = df.pivot_table(
    index=['year', 'city'],
    columns='category',
    aggfunc='sum',
    fill_value=0,
    margins=True,
    margins_name='total'
).drop('total')

#            count_per_city              
# category                1   2   3 total
# year  city                             
# 2005  NYC             145  55   0   200
# 2006  LA                0   0  40    40
# 2007  ATL              75   0   0    75

If you want the exact output from the OP, you can do some cleanup (thanks to @HenryEcker):

out.droplevel(0, axis=1).rename_axis(columns=None).reset_index()

#    year city    1   2   3  total
# 0  2005  NYC  145  55   0    200
# 1  2006   LA    0   0  40     40
# 2  2007  ATL   75   0   0     75

Upvotes: 1

Related Questions