Arruldhana Mathy
Arruldhana Mathy

Reputation: 47

Fill a End_date Column with next row's Start_date in python pandas

I have a dataframe as below (Slowly changing Dimension). I want to fill end_dt column with the next row start_dt for a particular group identified by key column.

Key start_dt   end_dt
1   25-05-2019 NA
1   30-06-2019 NA
1   15-07-2019 NA
2   17-07-2019 NA
2   15-07-2019 NA

I want the data to be populated for end_dt as below.

Key start_dt   end_dt
1   25-05-2019 30-06-2019
1   30-06-2019 15-07-2019
1   15-07-2019 NA
2   17-07-2019 NA
2   15-07-2019 17-07-2019

What i tried is as below. I have generated a sequence for each key group starting with 2 after sorting start_dt ascending. And then created a new dataframe with start_dt and same key column, and then sorting them based on start_dt, then creating a sequence for key group starting with 1.

Then joined these two dataframes with the sequence column. Please help for an optimal solution.

df.sort_values(['start_dt'],ascending=[True],inplace=True)
df['seq'] = df.groupby(['key']).cumcount() + 2

temp= pd.DataFrame(df)
temp.sort_values(['start_dt'],ascending=[True],inplace=True)
temp=temp.rename(columns = {'start_dt':'end_dt'})
temp['seq'] = temp.groupby(['key']).cumcount() + 1

df = pd.merge(df, temp[['key','seq','end_dt']],how='left', on=['key','seq'])

Upvotes: 1

Views: 551

Answers (1)

yatu
yatu

Reputation: 88226

It seems like you first need to sort_values both by Key and start_dt, and then you can use GroupBy and shift to assign the following start_dt:

df = df.sort_values(['Key', 'start_dt'])
df['end_dt'] = df.groupby('Key').start_dt.shift(-1)

print(df.reset_index(drop=True))

    Key    start_dt      end_dt
0    1  15-07-2019  25-05-2019
1    1  25-05-2019  30-06-2019
2    1  30-06-2019         NaN
3    2  15-07-2019  17-07-2019
4    2  17-07-2019         NaN

Upvotes: 3

Related Questions