Zanam
Zanam

Reputation: 4807

Repeating rows of dataframe based on date range

I have a dataframe like so:

Col1    Col2    Col3    StartDate   EndDate     Qty
24HR    A1      B1      1/1/2020    2/28/2020   4.2
asd     A2      B2      5/1/2020    8/31/2020   35

I want to repeat each row of the dataframe based on number of months between StartDate and EndDate to get something of the following sort:

Col1    Col2    Col3    StartDate   EndDate    Qty
24HR    A1      B1      1/1/2020    1/31/2020   4.2
24HR    A1      B1      2/1/2020    2/28/2020   4.2
asd     A2      B2      5/1/2020    5/31/2020   35
asd     A2      B2      6/1/2020    6/30/2020   35
asd     A2      B2      7/1/2020    7/31/2020   35
asd     A2      B2      8/1/2020    8/31/2020   35

I am not sure how to approach this problem.

Upvotes: 1

Views: 985

Answers (3)

Darkknight
Darkknight

Reputation: 1826

You can try this...

list=[]
for i in range(df.shape[0]):
    list.append(int(df['EndDate'][i][0])-int(df['StartDate'][i][0])+1)

df= df.reindex(df.index.repeat(pd.Series(list)))
print(df)
# output
         Col1 Col2  Col3 StartDate  EndDate   Qty
     0  24HR   A1   B1    1/1/2020  2/28/2020   4.2
     0  24HR   A1   B1    1/1/2020  2/28/2020   4.2
     1   asd   A2   B2    5/1/2020  8/31/2020  35.0
     1   asd   A2   B2    5/1/2020  8/31/2020  35.0
     1   asd   A2   B2    5/1/2020  8/31/2020  35.0
     1   asd   A2   B2    5/1/2020  8/31/2020  35.0

Edit - Short version of this code

df= df.reindex(df.index.repeat(pd.Series(df.EndDate.str[0].astype(int)+1-df.StartDate.str[0].astype(int))))

Upvotes: 1

wwnde
wwnde

Reputation: 26676

#Coerce the dates to datetime

df.EndDate=pd.to_datetime(df.EndDate)
df.StartDate=pd.to_datetime(df.StartDate)

#Find the time delta in months
df['t']=((df.EndDate.sub(df.StartDate))/ np.timedelta64(1, 'M')).round(0).astype(int)

#Use `np.repeat`

pd.DataFrame(np.repeat(df.values,df.t,axis=0),columns=df.columns).drop(columns=['t'])

enter image description here

Upvotes: 1

BENY
BENY

Reputation: 323226

Let us try explode

df.StartDate=df.apply(lambda x : pd.date_range(start=x['StartDate'], end=x['EndDate'], freq='MS') ,axis=1)
df=df.explode('StartDate')
df.EndDate=df.StartDate+pd.tseries.offsets.MonthEnd()
df
   Col1 Col2 Col3  StartDate    EndDate   Qty
0  24HR   A1   B1 2020-01-01 2020-01-31   4.2
0  24HR   A1   B1 2020-02-01 2020-02-29   4.2
1   asd   A2   B2 2020-05-01 2020-05-31  35.0
1   asd   A2   B2 2020-06-01 2020-06-30  35.0
1   asd   A2   B2 2020-07-01 2020-07-31  35.0
1   asd   A2   B2 2020-08-01 2020-08-31  35.0

Upvotes: 2

Related Questions