Alexis
Alexis

Reputation: 2302

Reseting indexes when I have same name multi index in pandas

I have this dataset:

mydf = pd.DataFrame({'date':pd.date_range('01/01/2020', periods=48, freq='15D'), 
                     'value':np.random.randint(20,30,48)})
mydf
    date        value
0   2020-01-01  20
1   2020-01-16  28
2   2020-01-31  23
3   2020-02-15  27
4   2020-03-01  25
5   2020-03-16  25
...

And I want to count the values per each month, so I applied this:

mydf.groupby([mydf['date'].dt.year,mydf['date'].dt.month]).size()
date  date
2020  1       3
      2       1
      3       3
      4       2
      5       2
...

Now, I want to reset the indexes, in order to have this expected result:

Year  Month   Size
2020  1       3
2020  2       1
2020  3       3
2020  4       2
2020  5       2

So I tried this:

mydf.groupby([mydf['date'].dt.year,mydf['date'].dt.month]).size().\
    to_frame().reset_index()

But I received this error:

ValueError: cannot insert date, already exists

Then I tried to rename the columns to avoid this problem, but I received a multi index list of tuples, which I don't want:

mydf.groupby([mydf['date'].dt.year,mydf['date'].dt.month]).size().\
    to_frame().rename(columns={0:'Size'}).index.rename(['Year','Month'])
MultiIndex([(2020,  1),
            (2020,  2),
            (2020,  3),
            (2020,  4),
            (2020,  5),
...

What I am doing wrong? Please, is that my index name assignment is wrong? Do I have to reset levels or drop levels?

Upvotes: 1

Views: 746

Answers (2)

dhyat_obiwan
dhyat_obiwan

Reputation: 11

What you need is something like:

mydf_grouped = mydf.groupby([mydf['date'].dt.year,mydf['date'].dt.month]).size()

mydf_grouped.index.names = ['year', 'month']

mydf_grouped.reset_index()

Upvotes: 1

creanion
creanion

Reputation: 2753

A simple trick to know is that you can rename series inline like this

Instead of mydf['date'].dt.year

Do mydf['date'].dt.year.rename("year").

Upvotes: 3

Related Questions