Reputation: 455
My current dataframe:
| user | date | count|
| ------ | ------------------| -----|
| user 1 | 2017-11-09 | 3 |
| user 1 | 2017-11-12 | 4 |
| user 1 | 2017-11-14 | 5 |
| user 2 | 2017-11-11 | 2 |
| user 2 | 2017-11-12 | 6 |
| user 2 | 2017-11-14 | 7 |
What I would like it to be:
| user | date | count|
| ------ | ------------------| -----|
| user 1 | 2017-11-09 | 3 |
| user 1 | 2017-11-10 | 0 |
| user 1 | 2017-11-11 | 0 |
| user 1 | 2017-11-12 | 4 |
| user 1 | 2017-11-13 | 0 |
| user 1 | 2017-11-14 | 5 |
| user 2 | 2017-11-09 | 0 |
| user 2 | 2017-11-10 | 0 |
| user 2 | 2017-11-11 | 2 |
| user 2 | 2017-11-12 | 6 |
| user 2 | 2017-11-13 | 0 |
| user 2 | 2017-11-14 | 7 |
Any suggestions? I've tried doing the resample method, but I'm not sure that it works. Also, my dates are already converted to datetime objects.
EDIT: I'm thankful for the answers, but I've realized that my date column has multiple of the same dates, and indexing it is now an issue, because there are duplicate dates. Is there a way to approach the problem that doesn't require indexing (or gets around it?)
Upvotes: 1
Views: 85
Reputation: 862406
Use asfreq
working with DatetimeIndex
, so first set_index
:
#if necessary convert to datetimes
#df['date'] = pd.to_datetime(df['date'])
df = df.set_index('date').asfreq('d', fill_value=0)
print (df)
count
date
2017-11-09 3
2017-11-10 0
2017-11-11 0
2017-11-12 4
2017-11-13 0
2017-11-14 5
If there are duplicated DatetimeIndex
, is necessary resample
with some aggregate function like sum
:
df = df.set_index('date').resample('d').sum()
EDIT if dont need all combinations of dates:
df = (df.set_index('date').groupby('user')['count']
.apply(lambda x: x.asfreq('d', fill_value=0))
.reset_index())
print (df)
user date count
0 user 1 2017-11-09 3
1 user 1 2017-11-10 0
2 user 1 2017-11-11 0
3 user 1 2017-11-12 4
4 user 1 2017-11-13 0
5 user 1 2017-11-14 5
6 user 2 2017-11-11 2
7 user 2 2017-11-12 6
8 user 2 2017-11-13 0
9 user 2 2017-11-14 7
EDIT1 For all combinations of dates:
mux = pd.MultiIndex.from_product([df['user'].unique(), pd.date_range(df['date'].min(), df['date'].max())],
names=['user','date'])
df = df.set_index(['user', 'date']).reindex(mux, fill_value=0).reset_index()
print (df)
user date count
0 user 1 2017-11-09 3
1 user 1 2017-11-10 0
2 user 1 2017-11-11 0
3 user 1 2017-11-12 4
4 user 1 2017-11-13 0
5 user 1 2017-11-14 5
6 user 2 2017-11-09 0
7 user 2 2017-11-10 0
8 user 2 2017-11-11 2
9 user 2 2017-11-12 6
10 user 2 2017-11-13 0
11 user 2 2017-11-14 7
Upvotes: 4
Reputation: 3358
@jezrael's answer is really awesome! Just to add a bit if you want to use an arbitrary range of date:
more_dates = pd.date_range('20171101', '20171120')
df = df.reindex(more_dates, fill_value=0)
count
2017-11-01 0.0
2017-11-02 0.0
2017-11-03 0.0
2017-11-04 0.0
2017-11-05 0.0
2017-11-06 0.0
2017-11-07 0.0
2017-11-08 0.0
2017-11-09 3.0
2017-11-10 0.0
2017-11-11 0.0
2017-11-12 4.0
2017-11-13 0.0
2017-11-14 5.0
2017-11-15 0.0
2017-11-16 0.0
2017-11-17 0.0
2017-11-18 0.0
2017-11-19 0.0
2017-11-20 0.0
Thanks @jezrael for the advice, edited.
Upvotes: 1
Reputation: 59264
You can create a date_range
and use merge
Example:
>>> pd.date_range(start=df.date.min(), end=df.date.max(), freq='1D')
DatetimeIndex(['2017-11-09', '2017-11-10', '2017-11-11', '2017-11-12',
'2017-11-13', '2017-11-14'],
dtype='datetime64[ns]', freq='D')
Then
df2 = pd.DataFrame((pd.date_range(start=df.date.min(), end=df.date.max(), freq='1D')), columns=["date"])
pd.merge(df2,df, on="date", how="left").fillna(0)
date count
0 2017-11-09 3.0
1 2017-11-10 0.0
2 2017-11-11 0.0
3 2017-11-12 4.0
4 2017-11-13 0.0
5 2017-11-14 5.0
Upvotes: 2