HBN
HBN

Reputation: 33

compare two date columns - check if they fall in range - take value from 3rd column

I am working on a question of whether patients' testresults were received between minus 1 and + 3 days after being admitted to hospital. If so, I want to retrieve to which department they were admitted to first.

Complexity comes from patients being tested multiple times during admission as well as patients being re-admitted within short time frames. Testing 3 times per week is not uncommon. However tests without admission are also possible. I want to make sure I get the admission department that is related the test result and its date.

Data comes from two different tables joined (for technical reasons / due to different data source) using python. I have done a left (outer) join of these dfs on pin so therefore final df is in long format.

For example

pin= [1522, 1522, 3830] 

date_rslt = ['2018-04-18', '2018-04-18', '2018-04-09'] 

date_admis = ['2017-12-14', '2018-04-17', '2018-04-08'] 
dept = ['ER', 'INT', 'ER']

df = pd.DataFrame(list(zip(pin, date_rslt, date_admis, dept)), 
               columns =['pin', 'date_rslt', 'date_admis', 'dept']) 

Questions:

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

import pandas as pd
from datetime import datetime, timedelta

department = []

if (df_final['resultaatdatum'] < (df_final['date_admission'] +  timedelta(days = + 3))) and (df_final['resultaatdatum'] > (df_final['date_admission'] + timedelta(days = - 1))):
    department.append(df_final['admissiondepartment'])
else:
    department.append(NaN)
    
df_final['department'] = department

I should get df_final['department'] = [NaN, 'INT', 'ER']

Upvotes: 2

Views: 47

Answers (3)

mullinscr
mullinscr

Reputation: 1738

If you want to see which patients' test results were received between minus 1 and + 3 days after being admitted to hospital, then you could just query the data:

df['date_rslt'] = pd.to_datetime(df['date_rslt']) # will need to be datetime
df['date_admis'] = pd.to_datetime(df['date_admis']) # will need to be datetime

df['admis_plus_3'] = df['date_admis'] + pd.Timedelta(days=3)
df['admis_minus_1'] = df['date_admis'] + pd.Timedelta(days=-1)

patients = df.query('(date_rslt < admis_plus_3) & (date_rslt > admis_minus_1)')

If you needed the department list out from this you can just do: patients['dept'].

Or you could always do a merge with the other dataset if you need to incorporate NaN rows back into the data.

Upvotes: 0

DavideBrex
DavideBrex

Reputation: 2414

You can do it without a list in this way:

import numpy as np

m =(df['date_rslt'] < (df['date_admis'] +  timedelta(days = + 3))) & (df['date_rslt'] > (df['date_admis'] + timedelta(days = - 1)))

df.loc[~m,"dept"] = np.nan

Output:

    pin    date_rslt    date_admis  dept
0   1522    2018-04-18  2017-12-14  NaN
1   1522    2018-04-18  2018-04-17  INT
2   3830    2018-04-09  2018-04-08  ER

Upvotes: 1

Yash
Yash

Reputation: 1281

Use iterrows. a series cannot give a single boolean value which if expects.

for index,row in df_final.iterrows():
    if (df_final['date_rslt'][index] < (df_final['date_admis'][index] +  timedelta(days = + 3))) and (df_final['date_rslt'][index] > (df_final['date_admis'][index] + timedelta(days = - 1))):
        department.append(df_final['dept'][index])
    else:
        department.append(np.NaN)

output:

[nan, 'INT', 'ER']

Upvotes: 1

Related Questions