Reputation: 1769
I've got trouble removing the index column in pandas after groupby and unstack a DataFrame.
My original DataFrame looks like this:
example = pd.DataFrame({'date': ['2016-12', '2016-12', '2017-01', '2017-01', '2017-02', '2017-02', '2017-02'], 'customer': [123, 456, 123, 456, 123, 456, 456], 'sales': [10.5, 25.2, 6.8, 23.4, 29.5, 23.5, 10.4]})
example.head(10)
output:
date | customer | sales | |
---|---|---|---|
0 | 2016-12 | 123 | 10.5 |
1 | 2016-12 | 456 | 25.2 |
2 | 2017-01 | 123 | 6.8 |
3 | 2017-01 | 456 | 23.4 |
4 | 2017-2 | 123 | 29.5 |
5 | 2017-2 | 456 | 23.5 |
6 | 2017-2 | 456 | 10.4 |
Note that it's possible to have multiple sales for one customer per month (like in row 5 and 6).
My aim is to convert the DataFrame into an aggregated DataFrame like this:
customer | 2016-12 | 2017-01 | 2017-02 |
---|---|---|---|
123 | 10.5 | 6.8 | 29.5 |
234 | 25.2 | 23.4 | 33.9 |
My solution so far:
example = example[['date', 'customer', 'sales']].groupby(['date', 'customer']).sum().unstack('date')
example.head(10)
output:
sales | |||
---|---|---|---|
date | 2016-12 | 2017-01 | 2017-02 |
customer | |||
123 | 10.5 | 6.8 | 29.5 |
234 | 25.2 | 23.4 | 33.9 |
example = example['sales'].reset_index(level=[0])
example.head(10)
output:
date | customer | 2016-12 | 2017-01 | 2017-02 |
---|---|---|---|---|
0 | 123 | 10.5 | 6.8 | 29.5 |
1 | 234 | 25.2 | 23.4 | 33.9 |
At this point I'm unable to remove the "date" column:
example.reset_index(drop = True)
example.head()
output:
date | customer | 2016-12 | 2017-01 | 2017-02 |
---|---|---|---|---|
0 | 123 | 10.5 | 6.8 | 29.5 |
1 | 234 | 25.2 | 23.4 | 33.9 |
It just stays the same. Have you got any ideas?
Upvotes: 3
Views: 4741
Reputation: 1804
Why not directly go with pivot_table
?
(example
.pivot_table('sales', index='customer', columns="date", aggfunc='sum')
.rename_axis(columns=None).reset_index())
customer 2016-12 2017-01 2017-02
0 123 10.5 6.8 29.5
1 456 25.2 23.4 33.9
Upvotes: 3
Reputation: 28709
An alternative to your solution, but the key is just to add a rename_axis(columns = None)
, as the date
is the name for the columns axis:
(example[["date", "customer", "sales"]]
.groupby(["date", "customer"])
.sum()
.unstack("date")
.droplevel(0, axis="columns")
.rename_axis(columns=None)
.reset_index())
customer 2016-12 2017-01 2017-02
0 123 10.5 6.8 29.5
1 456 25.2 23.4 33.9
Upvotes: 6