Reputation: 95
I have a pandas dataframe wih two columns: Name
, date
I tried to create a new column date_last
to store to last date for each row.
Sample input:
Name date
John 2020-05-04
John 2019-12-10
John 2019-11-17
John 2019-08-12
John 2019-01-10
John 2019-01-07
Sam 2020-05-01
Sam 2020-04-15
Sam 2020-03-22
Desired ontput:
Name date date_last
John 2020-05-04 2019-12-10
John 2019-12-10 2019-11-17
John 2019-11-17 2019-08-12
John 2019-08-12 2019-01-10
John 2019-01-10 2019-01-07
John 2019-01-07 None
Sam 2020-05-01 2020-04-15
Sam 2020-04-15 2020-03-22
Sam 2020-03-22 None
My trial:
df = pd.DataFrame({
'Name':['John', 'John','John','John','John','John','Sam','Sam','Sam'],
'date':['2020-05-04', '2019-12-10', '2019-11-17', '2019-08-12', '2019-01-10', '2019-01-07', '2020-05-01', '2020-04-15','2020-03-22']})
df['date'] = pd.to_datetime(df['date'])
df['dateRank'] = df.groupby('Name').rank('dense')
df = df.merge(df, on = ['Name'], how = 'outer')
df = df[df['dateRank_x'] - df['dateRank_y'] == 1]
df = df[['Name', 'date_x', 'date_y']].rename(columns={'date_x':'date', 'date_y':'date_last'})
df
My output:
Name date date_last
1 John 2020-05-04 2019-12-10
8 John 2019-12-10 2019-11-17
15 John 2019-11-17 2019-08-12
22 John 2019-08-12 2019-01-10
29 John 2019-01-10 2019-01-07
37 Sam 2020-05-01 2020-04-15
41 Sam 2020-04-15 2020-03-22
Anyone know how to achieve the desired output?
Upvotes: 1
Views: 129
Reputation: 75080
You can sort the Name and date first and then groupby on Name and shift the date:
out = df.assign(date_last = df.sort_values(['Name','date'])
.groupby('Name',sort=False)['date'].shift())
Name date date_last
0 John 2020-05-04 2019-12-10
1 John 2019-12-10 2019-11-17
2 John 2019-11-17 2019-08-12
3 John 2019-08-12 2019-01-10
4 John 2019-01-10 2019-01-07
5 John 2019-01-07 NaT
6 Sam 2020-05-01 2020-04-15
7 Sam 2020-04-15 2020-03-22
8 Sam 2020-03-22 NaT
Upvotes: 1