Pandas: How to remove the index column after groupby and unstack?

Question

I've got trouble removing the index column in pandas after groupby and unstack a DataFrame.

My original DataFrame looks like this:

example = pd.DataFrame({'date': ['2016-12', '2016-12', '2017-01', '2017-01', '2017-02', '2017-02', '2017-02'], 'customer': [123, 456, 123, 456, 123, 456, 456], 'sales': [10.5, 25.2, 6.8, 23.4, 29.5, 23.5, 10.4]})
example.head(10)

output:

	date	customer	sales
0	2016-12	123	10.5
1	2016-12	456	25.2
2	2017-01	123	6.8
3	2017-01	456	23.4
4	2017-2	123	29.5
5	2017-2	456	23.5
6	2017-2	456	10.4

Note that it's possible to have multiple sales for one customer per month (like in row 5 and 6).

My aim is to convert the DataFrame into an aggregated DataFrame like this:

customer	2016-12	2017-01	2017-02
123	10.5	6.8	29.5
234	25.2	23.4	33.9

My solution so far:

example = example[['date', 'customer', 'sales']].groupby(['date', 'customer']).sum().unstack('date')
example.head(10)

output:

	sales
date	2016-12	2017-01	2017-02
customer
123	10.5	6.8	29.5
234	25.2	23.4	33.9

example = example['sales'].reset_index(level=[0])
example.head(10)

output:

date	customer	2016-12	2017-01	2017-02
0	123	10.5	6.8	29.5
1	234	25.2	23.4	33.9

At this point I'm unable to remove the "date" column:

example.reset_index(drop = True)
example.head()

output:

date	customer	2016-12	2017-01	2017-02
0	123	10.5	6.8	29.5
1	234	25.2	23.4	33.9

It just stays the same. Have you got any ideas?

sammywemmy · Accepted Answer

An alternative to your solution, but the key is just to add a rename_axis(columns = None), as the date is the name for the columns axis:

(example[["date", "customer", "sales"]]
.groupby(["date", "customer"])
.sum()
.unstack("date")
.droplevel(0, axis="columns")
.rename_axis(columns=None)
.reset_index())

    customer    2016-12 2017-01 2017-02
0   123          10.5   6.8     29.5
1   456          25.2   23.4    33.9

Pandas: How to remove the index column after groupby and unstack?

Answers (2)

Related Questions