Reputation: 49
A bit complicated to explain this one (see example table below for reference).
I have a dataframe with a 'Date Received' column (datetime)
I want to compare the 'Date Received' with the date in the Stage columns to see if it was on-time or late. The problem I have is that each document corresponds to a different stage, for example, file 26 might have a Stage 4 Date where as File 28 might be Stage 1.
How do I get Python to search for the correct stage column and then compare with date received?
Filename Date Received Stage 1 Expected Stage 2 Expected Stage 3 Expected Stage 4 Expected
File 1 01/01/2021 15/12/2020 NaN NaN NaN
File 2 01/01/2021 NaN 05/01/2021 NaN NaN
Upvotes: 1
Views: 61
Reputation: 2430
You can use melt
with dropna()
:
df2 = df.melt(['Filename','Date Received']).dropna()
df2 = df2.reset_index(drop=True).rename({'variable':'Stage','value':'Date'},axis='columns')
Output:
>>> df2
Filename Date Received Stage Date
0 File 1 01/01/2021 Stage 1 Expected 15/12/2020
1 File 2 01/01/2021 Stage 2 Expected 05/01/2021
While the original data is still preserved in df
Now comparision:
df2['Date']=pd.to_datetime(df2['Date'], format='%d/%m/%Y')
df2['Date Received']=pd.to_datetime(df2['Date Received'], format='%d/%m/%Y')
df2['Status']=(df2['Date Received']>df2['Date']).map({False:'On-Time',True:'Late'})
Output of Comparison:
>>> df2
Filename Date Received Stage Date Status
0 File 1 2021-01-01 Stage 1 Expected 2020-12-15 Late
1 File 2 2021-01-01 Stage 2 Expected 2021-01-05 On-Time
Upvotes: 1
Reputation: 23099
It would be better if you melted your dataframe to compare the columns.
df1 = pd.melt(df,id_vars=['Filename','Date_Received'],var_name='Expected',value_name='Date')
#df1[['Date_Received','Date']] = df1[['Date_Received','Date']].apply(pd.to_datetime)
print(df1)
Filename Date_Received Expected Date
0 File_1 2021-01-01 Stage_1_Expected 2020-12-15
1 File_2 2021-01-01 Stage_1_Expected NaT
2 File_1 2021-01-01 Stage_2_Expected NaT
3 File_2 2021-01-01 Stage_2_Expected 2021-05-01
4 File_1 2021-01-01 Stage_3_Expected NaT
5 File_2 2021-01-01 Stage_3_Expected NaT
6 File_1 2021-01-01 Stage_4_Expected NaT
7 File_2 2021-01-01 Stage_4_Expected NaT
df1.loc[df1['Date'].isna(),'Status'] = 'Not Received'
df1.loc[df1['Date'] >= df1['Date_Received'], 'Status'] = 'On Time'
df1['Status'] = df1['Status'].fillna('Late')
print(df1)
Filename Date_Received Expected Date Status
0 File_1 2021-01-01 Stage_1_Expected 2020-12-15 Late
1 File_2 2021-01-01 Stage_1_Expected NaT Not Received
2 File_1 2021-01-01 Stage_2_Expected NaT Not Received
3 File_2 2021-01-01 Stage_2_Expected 2021-05-01 On Time
4 File_1 2021-01-01 Stage_3_Expected NaT Not Received
5 File_2 2021-01-01 Stage_3_Expected NaT Not Received
6 File_1 2021-01-01 Stage_4_Expected NaT Not Received
7 File_2 2021-01-01 Stage_4_Expected NaT Not Received
Upvotes: 1