Reputation: 777
I have the following dataframe:
dataframe = pd.DataFrame({'date':pd.to_datetime(['2018-05-27', '2018-05-27','2018-05-28','2018-06-1']),
'code': ['1', '1', '1', '2']})
dataframe
date code
0 2018-05-27 1
1 2018-05-27 1
2 2018-05-28 1
3 2018-06-01 2
And I want:
dataframe = pd.DataFrame({'date':pd.to_datetime(['2018-05-27', '2018-05-28','2018-05-29','2018-05-30','2018-05-31', '2018-06-1']),
'1': [2, 1, 1, 0, 0, 0],
'2': [0, 0, 0, 0, 0, 1]})
dataframe
date 1 2
0 2018-05-27 2 0
1 2018-05-28 1 0
2 2018-05-29 0 0
3 2018-05-30 0 0
4 2018-05-31 0 0
5 2018-06-01 0 1
My first attempt is:
dataframe.groupby([dataframe['code'], dataframe['date'].dt.month.rename('month'), dataframe['date'].dt.day.rename('day')]).agg({"count"})
code month day date count
1 5 27 2
28 1
2 6 1 1
But gives me two problems: (1) the date is no longer in a date format, and (2) I don't know how to transpose it. I've been fighting with pandas time series and aggregate functions, with no much success. I also would like to have it just by months. Any idea of how to proceed?
I've already checked the following StackOverflow's questions. Here and here but is not exactly the same.
Upvotes: 1
Views: 404
Reputation: 153460
You can use pd.crosstab
:
pd.crosstab(dataframe['date'], dataframe['code'])\
.reindex(pd.date_range(dataframe['date'].min(),
dataframe['date'].max()), fill_value=0)
Output:
code 1 2
2018-05-27 2 0
2018-05-28 1 0
2018-05-29 0 0
2018-05-30 0 0
2018-05-31 0 0
2018-06-01 0 1
Option #2
dataframe.groupby(['date','code']).size()\
.unstack(1, fill_value=0)\
.reindex(pd.date_range(dataframe['date'].min(),
dataframe['date'].max()),
fill_value=0)
Output:
code 1 2
2018-05-27 2 0
2018-05-28 1 0
2018-05-29 0 0
2018-05-30 0 0
2018-05-31 0 0
2018-06-01 0 1
Upvotes: 3