NICode
NICode

Reputation: 49

Comparing One Column against Multiple

A bit complicated to explain this one (see example table below for reference).

I have a dataframe with a 'Date Received' column (datetime)

I want to compare the 'Date Received' with the date in the Stage columns to see if it was on-time or late. The problem I have is that each document corresponds to a different stage, for example, file 26 might have a Stage 4 Date where as File 28 might be Stage 1.

How do I get Python to search for the correct stage column and then compare with date received?

Filename Date Received  Stage 1 Expected  Stage 2 Expected  Stage 3 Expected  Stage 4 Expected
File 1   01/01/2021     15/12/2020        NaN               NaN               NaN
File 2   01/01/2021     NaN               05/01/2021        NaN               NaN

Upvotes: 1

Views: 61

Answers (2)

Rishabh Kumar
Rishabh Kumar

Reputation: 2430

You can use melt with dropna():

df2 = df.melt(['Filename','Date Received']).dropna()

df2 = df2.reset_index(drop=True).rename({'variable':'Stage','value':'Date'},axis='columns')

Output:

>>> df2
  Filename Date Received             Stage        Date
0   File 1    01/01/2021  Stage 1 Expected  15/12/2020
1   File 2    01/01/2021  Stage 2 Expected  05/01/2021

While the original data is still preserved in df

Now comparision:

df2['Date']=pd.to_datetime(df2['Date'], format='%d/%m/%Y')
df2['Date Received']=pd.to_datetime(df2['Date Received'], format='%d/%m/%Y')

df2['Status']=(df2['Date Received']>df2['Date']).map({False:'On-Time',True:'Late'})

Output of Comparison:

>>> df2
  Filename Date Received             Stage       Date   Status
0   File 1    2021-01-01  Stage 1 Expected 2020-12-15     Late
1   File 2    2021-01-01  Stage 2 Expected 2021-01-05  On-Time

Upvotes: 1

Umar.H
Umar.H

Reputation: 23099

It would be better if you melted your dataframe to compare the columns.

df1 = pd.melt(df,id_vars=['Filename','Date_Received'],var_name='Expected',value_name='Date')

#df1[['Date_Received','Date']] = df1[['Date_Received','Date']].apply(pd.to_datetime)

print(df1)

  Filename Date_Received          Expected       Date
0   File_1    2021-01-01  Stage_1_Expected 2020-12-15
1   File_2    2021-01-01  Stage_1_Expected        NaT
2   File_1    2021-01-01  Stage_2_Expected        NaT
3   File_2    2021-01-01  Stage_2_Expected 2021-05-01
4   File_1    2021-01-01  Stage_3_Expected        NaT
5   File_2    2021-01-01  Stage_3_Expected        NaT
6   File_1    2021-01-01  Stage_4_Expected        NaT
7   File_2    2021-01-01  Stage_4_Expected        NaT

df1.loc[df1['Date'].isna(),'Status'] = 'Not Received'
df1.loc[df1['Date'] >= df1['Date_Received'], 'Status'] = 'On Time'
df1['Status'] = df1['Status'].fillna('Late')

print(df1)

 Filename Date_Received          Expected       Date        Status
0   File_1    2021-01-01  Stage_1_Expected 2020-12-15          Late
1   File_2    2021-01-01  Stage_1_Expected        NaT  Not Received
2   File_1    2021-01-01  Stage_2_Expected        NaT  Not Received
3   File_2    2021-01-01  Stage_2_Expected 2021-05-01       On Time
4   File_1    2021-01-01  Stage_3_Expected        NaT  Not Received
5   File_2    2021-01-01  Stage_3_Expected        NaT  Not Received
6   File_1    2021-01-01  Stage_4_Expected        NaT  Not Received
7   File_2    2021-01-01  Stage_4_Expected        NaT  Not Received

Upvotes: 1

Related Questions