Xbel
Xbel

Reputation: 777

Pandas time-series: aggregate by day and transpose

I have the following dataframe:

dataframe = pd.DataFrame({'date':pd.to_datetime(['2018-05-27', '2018-05-27','2018-05-28','2018-06-1']), 
                         'code': ['1', '1', '1', '2']})
dataframe 
    date      code
0   2018-05-27  1
1   2018-05-27  1
2   2018-05-28  1
3   2018-06-01  2

And I want:

dataframe = pd.DataFrame({'date':pd.to_datetime(['2018-05-27', '2018-05-28','2018-05-29','2018-05-30','2018-05-31', '2018-06-1']),
                          '1': [2, 1, 1, 0, 0, 0], 
                          '2': [0, 0, 0, 0, 0, 1]})
dataframe
    date        1   2
0   2018-05-27  2   0
1   2018-05-28  1   0
2   2018-05-29  0   0
3   2018-05-30  0   0
4   2018-05-31  0   0
5   2018-06-01  0   1

My first attempt is:

dataframe.groupby([dataframe['code'], dataframe['date'].dt.month.rename('month'), dataframe['date'].dt.day.rename('day')]).agg({"count"})


code    month day    date count
1         5    27       2
               28       1
2         6     1       1

But gives me two problems: (1) the date is no longer in a date format, and (2) I don't know how to transpose it. I've been fighting with pandas time series and aggregate functions, with no much success. I also would like to have it just by months. Any idea of how to proceed?

I've already checked the following StackOverflow's questions. Here and here but is not exactly the same.

Upvotes: 1

Views: 404

Answers (1)

Scott Boston
Scott Boston

Reputation: 153460

You can use pd.crosstab:

pd.crosstab(dataframe['date'], dataframe['code'])\
  .reindex(pd.date_range(dataframe['date'].min(), 
                         dataframe['date'].max()), fill_value=0)

Output:

code        1  2
2018-05-27  2  0
2018-05-28  1  0
2018-05-29  0  0
2018-05-30  0  0
2018-05-31  0  0
2018-06-01  0  1

Option #2

dataframe.groupby(['date','code']).size()\
         .unstack(1, fill_value=0)\
         .reindex(pd.date_range(dataframe['date'].min(), 
                                dataframe['date'].max()), 
                  fill_value=0)

Output:

code        1  2
2018-05-27  2  0
2018-05-28  1  0
2018-05-29  0  0
2018-05-30  0  0
2018-05-31  0  0
2018-06-01  0  1

Upvotes: 3

Related Questions