Reputation: 2302
I have this dataset:
mydf = pd.DataFrame({'date':pd.date_range('01/01/2020', periods=48, freq='15D'),
'value':np.random.randint(20,30,48)})
mydf
date value
0 2020-01-01 20
1 2020-01-16 28
2 2020-01-31 23
3 2020-02-15 27
4 2020-03-01 25
5 2020-03-16 25
...
And I want to count the values per each month, so I applied this:
mydf.groupby([mydf['date'].dt.year,mydf['date'].dt.month]).size()
date date
2020 1 3
2 1
3 3
4 2
5 2
...
Now, I want to reset the indexes, in order to have this expected result:
Year Month Size
2020 1 3
2020 2 1
2020 3 3
2020 4 2
2020 5 2
So I tried this:
mydf.groupby([mydf['date'].dt.year,mydf['date'].dt.month]).size().\
to_frame().reset_index()
But I received this error:
ValueError: cannot insert date, already exists
Then I tried to rename the columns to avoid this problem, but I received a multi index list of tuples, which I don't want:
mydf.groupby([mydf['date'].dt.year,mydf['date'].dt.month]).size().\
to_frame().rename(columns={0:'Size'}).index.rename(['Year','Month'])
MultiIndex([(2020, 1),
(2020, 2),
(2020, 3),
(2020, 4),
(2020, 5),
...
What I am doing wrong? Please, is that my index name assignment is wrong? Do I have to reset levels or drop levels?
Upvotes: 1
Views: 746
Reputation: 11
What you need is something like:
mydf_grouped = mydf.groupby([mydf['date'].dt.year,mydf['date'].dt.month]).size()
mydf_grouped.index.names = ['year', 'month']
mydf_grouped.reset_index()
Upvotes: 1
Reputation: 2753
A simple trick to know is that you can rename series inline like this
Instead of mydf['date'].dt.year
Do mydf['date'].dt.year.rename("year")
.
Upvotes: 3