Reputation: 111
I am looking to take a daily record of transactions and account for days when 0 transactions occurred.
Here is my initial dataframe:
df.head()
tr_timestamp text location
2016-01-01 cookies TX
2016-01-01 pizza TX
2016-01-04 apples TX
2016-01-08 bread TX
When I run a group by day, I get the following:
df_by_day = df['tr_timestamp'].groupby(df.tr_timestamp).count()
df_by_day
tr_timestamp
2016-01-01 2
2016-01-04 1
2016-01-08 1
I'm looking to use Python/Pandas where dates without a transaction are filled such that I get the following output:
df_by_day_filled
tr_timestamp
2016-01-01 2
2016-01-02 0
2016-01-03 0
2016-01-04 1
2016-01-05 0
2016-01-06 0
2016-01-07 0
2016-01-08 1
I've tried the following answers, which don't quite give the output I need returned:
Pandas groupby for zero values
Fill Missing Dates in DataFrame with Duplicate Dates in Groupby
Thanks.
Upvotes: 2
Views: 1678
Reputation: 402603
This is a resample
operation:
df.set_index(pd.to_datetime(df.pop('tr_timestamp'))).resample('D')['text'].count()
tr_timestamp
2016-01-01 2
2016-01-02 0
2016-01-03 0
2016-01-04 1
2016-01-05 0
2016-01-06 0
2016-01-07 0
2016-01-08 1
Freq: D, Name: text, dtype: int64
The pd.to_datetime
call ensures this works if "tr_timestamp" is not a datetime. If it is, then the solution simplifies to
df.dtypes
tr_timestamp datetime64[ns]
text object
location object
dtype: object
df.set_index('tr_timestamp').resample('D')['text'].count()
tr_timestamp
2016-01-01 2
2016-01-02 0
2016-01-03 0
2016-01-04 1
2016-01-05 0
2016-01-06 0
2016-01-07 0
2016-01-08 1
Freq: D, Name: text, dtype: int64
Upvotes: 2
Reputation: 150765
You can also try:
df_by_day.asfreq('D', fill_value=0)
Output:
tr_timestamp
2016-01-01 2
2016-01-02 0
2016-01-03 0
2016-01-04 1
2016-01-05 0
2016-01-06 0
2016-01-07 0
2016-01-08 1
Freq: D, Name: tr_timestamp, dtype: int64
Upvotes: 4