How to update Pandas dataframe based on the other row?

Question

I have a dataframe that measure the student performance student as below:

ID  TestDate    PerformanceStatus (PS)
1   15/03/2016  0
1   01/04/2016  2
1   05/05/2016  1
1   07/06/2016  1
2   15/03/2016  0
2   01/04/2016  2
2   05/05/2016  1
2   07/06/2016  3
2   23/08/2016  1

I want to update my table to have a new column PreviousPerformanceStatus. This PreviousPerformanceStatus is calculated based on the performanceStatus monitored, as below: Note: If there is not performanceStatus recorded before the TestDate, I want to make the PreviousPerformanceStatus = PerformanceStatus

ID  TestDate    PS  PreviousPerformanceStatus
1   15/03/2016  0   0
1   01/04/2016  2   0
1   05/05/2016  1   2
1   07/06/2016  1   1
2   15/03/2016  0   0
2   01/04/2016  2   0
2   05/05/2016  1   2
2   07/06/2016  3   1
2   23/08/2016  1   3

I can do it with an SQL-statement, but how do I update my Dataframe with pandas. Thanks.

Example: (for with ID=1) The previousPerformanceStatus is calculated based on the PerformanceStatus from the "earlier" test date., so when TestDate=01/04/2016, I want to use the data from TestDate=15/03/2016. However, if I can't find any previous data, I will default the PreviousPerformanceStatus with the value in the PerformanceStatus

root · Accepted Answer

Perform a groupby on 'ID' and use shift and bfill:

# Ensure that the dates are pd.Timestamp, and sorted.
df['TestDate'] = pd.to_datetime(df['TestDate'], dayfirst=True)
df = df.sort_values(by=['ID', 'TestDate'])

# Perform a shift and bfill at the 'ID' group level.
df['PrevPS'] = df.groupby('ID')['PS'].apply(lambda grp: grp.shift().bfill()).astype(int)

You can skip the date conversion and sort if you know beforehand that your DataFrame will be sorted correctly.

The resulting output:

   ID   TestDate  PS  PrevPS
0   1 2016-03-15   0       0
1   1 2016-04-01   2       0
2   1 2016-05-05   1       2
3   1 2016-06-07   1       1
4   2 2016-03-15   0       0
5   2 2016-04-01   2       0
6   2 2016-05-05   1       2
7   2 2016-06-07   3       1
8   2 2016-08-23   1       3

How to update Pandas dataframe based on the other row?

Answers (1)

Related Questions