Reputation: 11
After using value_counts and some other data cleaning, I have my data in the form:
year city category count_per_city
2005 NYC 1 145
2007 ATL 1 75
2005 NYC 2 55
2006 LA 3 40
I'd like to convert it to this:
year city 1 2 3 total
2005 NYC 145 55 0 200
2006 LA 0 0 40 40
2007 ATL 75 0 0 75
I feel like there is a relatively simple way to do this that I'm missing.
Upvotes: 0
Views: 50
Reputation: 19947
Another solution using unstack
:
(
df.set_index(['year', 'city', 'category']).unstack(2)
.droplevel(0, axis=1)
.assign(Total =lambda x: x.fillna(0).apply(sum, axis=1))
.reset_index()
.rename_axis(columns='')
)
Upvotes: 0
Reputation: 41327
You can use pivot_table()
with margins
and fill_value
:
out = df.pivot_table(
index=['year', 'city'],
columns='category',
aggfunc='sum',
fill_value=0,
margins=True,
margins_name='total'
).drop('total')
# count_per_city
# category 1 2 3 total
# year city
# 2005 NYC 145 55 0 200
# 2006 LA 0 0 40 40
# 2007 ATL 75 0 0 75
If you want the exact output from the OP, you can do some cleanup (thanks to @HenryEcker):
out.droplevel(0, axis=1).rename_axis(columns=None).reset_index()
# year city 1 2 3 total
# 0 2005 NYC 145 55 0 200
# 1 2006 LA 0 0 40 40
# 2 2007 ATL 75 0 0 75
Upvotes: 1