Reputation: 719
I have a dataframe like this:
DATE MIN_AMOUNT MAX_AMOUNT MIN_DAY MAX_DAY
01/09/2022 10 20 1 2
01/09/2022 15 25 4 5
01/09/2022 30 50 7 10
05/09/2022 10 20 1 2
05/09/2022 15 25 4 5
07/09/2022 15 25 4 5
I want to expand the dataframe to all date range between the DATE column with forward filling. The desired putput is:
DATE MIN_AMOUNT MAX_AMOUNT MIN_DAY MAX_DAY
01/09/2022 10 20 1 2
01/09/2022 15 25 4 5
01/09/2022 30 50 7 10
02/09/2022 10 20 1 2
02/09/2022 15 25 4 5
02/09/2022 30 50 7 10
03/09/2022 10 20 1 2
03/09/2022 15 25 4 5
03/09/2022 30 50 7 10
04/09/2022 10 20 1 2
04/09/2022 15 25 4 5
04/09/2022 30 50 7 10
05/09/2022 10 20 1 2
05/09/2022 15 25 4 5
06/09/2022 10 20 1 2
06/09/2022 15 25 4 5
07/09/2022 15 25 4 5
Could you please help me about this?
Upvotes: 2
Views: 482
Reputation: 28659
A couple of merges should help with this, and should still be efficient as the data size increases:
Get the unique dates and build a new dataframe from that:
out = df.DATE.drop_duplicates()
dates = pd.date_range(out.min(), out.max(), freq='D')
dates = pd.DataFrame(dates, columns=['dates'])
Merge dates
with out
, and subsequently merge the outcome with the original dataframe:
(dates
.merge(
out,
left_on='dates',
right_on='DATE',
how = 'left')
# faster to fill on a Series than a Dataframe
.assign(DATE = lambda df: df.DATE.ffill())
.merge(
df,
on = 'DATE',
how = 'left')
.drop(columns='DATE')
.rename(columns= {'dates':'DATE'})
)
DATE MIN_AMOUNT MAX_AMOUNT MIN_DAY MAX_DAY
0 2022-09-01 10 20 1 2
1 2022-09-01 15 25 4 5
2 2022-09-01 30 50 7 10
3 2022-09-02 10 20 1 2
4 2022-09-02 15 25 4 5
5 2022-09-02 30 50 7 10
6 2022-09-03 10 20 1 2
7 2022-09-03 15 25 4 5
8 2022-09-03 30 50 7 10
9 2022-09-04 10 20 1 2
10 2022-09-04 15 25 4 5
11 2022-09-04 30 50 7 10
12 2022-09-05 10 20 1 2
13 2022-09-05 15 25 4 5
14 2022-09-06 10 20 1 2
15 2022-09-06 15 25 4 5
16 2022-09-07 15 25 4 5
Upvotes: 0
Reputation: 862431
First convert values to datetimes, create helper counter Series g
by GroupBy.cumcount
for reshape by DataFrame.set_index
and DataFrame.unstack
, then use DataFrame.asfreq
with method='ffill'
and reshape back by DataFrame.stack
, remove helper level by DataFrame.droplevel
, convert DatetimeIndex
to column, change format of datetimes and last create same dtypes like original DataFrame:
df['DATE'] = pd.to_datetime(df['DATE'], dayfirst=True)
g = df.groupby('DATE').cumcount()
df = (df.set_index(['DATE',g])
.unstack()
.asfreq('D', method='ffill')
.stack()
.droplevel(-1)
.reset_index()
.assign(DATE = lambda x: x['DATE'].dt.strftime('%d/%m/%Y'))
.astype(df.dtypes)
)
print (df)
DATE MIN_AMOUNT MAX_AMOUNT MIN_DAY MAX_DAY
0 2022-01-09 10 20 1 2
1 2022-01-09 15 25 4 5
2 2022-01-09 30 50 7 10
3 2022-02-09 10 20 1 2
4 2022-02-09 15 25 4 5
5 2022-02-09 30 50 7 10
6 2022-03-09 10 20 1 2
7 2022-03-09 15 25 4 5
8 2022-03-09 30 50 7 10
9 2022-04-09 10 20 1 2
10 2022-04-09 15 25 4 5
11 2022-04-09 30 50 7 10
12 2022-05-09 10 20 1 2
13 2022-05-09 15 25 4 5
14 2022-06-09 10 20 1 2
15 2022-06-09 15 25 4 5
16 2022-07-09 15 25 4 5
Upvotes: 3