Reputation: 33
I am working on a question of whether patients' testresults were received between minus 1 and + 3 days after being admitted to hospital. If so, I want to retrieve to which department they were admitted to first.
Complexity comes from patients being tested multiple times during admission as well as patients being re-admitted within short time frames. Testing 3 times per week is not uncommon. However tests without admission are also possible. I want to make sure I get the admission department that is related the test result and its date.
Data comes from two different tables joined (for technical reasons / due to different data source) using python. I have done a left (outer) join of these dfs on pin so therefore final df is in long format.
For example
pin= [1522, 1522, 3830]
date_rslt = ['2018-04-18', '2018-04-18', '2018-04-09']
date_admis = ['2017-12-14', '2018-04-17', '2018-04-08']
dept = ['ER', 'INT', 'ER']
df = pd.DataFrame(list(zip(pin, date_rslt, date_admis, dept)),
columns =['pin', 'date_rslt', 'date_admis', 'dept'])
Questions:
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
import pandas as pd
from datetime import datetime, timedelta
department = []
if (df_final['resultaatdatum'] < (df_final['date_admission'] + timedelta(days = + 3))) and (df_final['resultaatdatum'] > (df_final['date_admission'] + timedelta(days = - 1))):
department.append(df_final['admissiondepartment'])
else:
department.append(NaN)
df_final['department'] = department
I should get df_final['department'] = [NaN, 'INT', 'ER']
Upvotes: 2
Views: 47
Reputation: 1738
If you want to see which patients' test results were received between minus 1 and + 3 days after being admitted to hospital, then you could just query the data:
df['date_rslt'] = pd.to_datetime(df['date_rslt']) # will need to be datetime
df['date_admis'] = pd.to_datetime(df['date_admis']) # will need to be datetime
df['admis_plus_3'] = df['date_admis'] + pd.Timedelta(days=3)
df['admis_minus_1'] = df['date_admis'] + pd.Timedelta(days=-1)
patients = df.query('(date_rslt < admis_plus_3) & (date_rslt > admis_minus_1)')
If you needed the department list out from this you can just do: patients['dept']
.
Or you could always do a merge with the other dataset if you need to incorporate NaN rows back into the data.
Upvotes: 0
Reputation: 2414
You can do it without a list in this way:
import numpy as np
m =(df['date_rslt'] < (df['date_admis'] + timedelta(days = + 3))) & (df['date_rslt'] > (df['date_admis'] + timedelta(days = - 1)))
df.loc[~m,"dept"] = np.nan
Output:
pin date_rslt date_admis dept
0 1522 2018-04-18 2017-12-14 NaN
1 1522 2018-04-18 2018-04-17 INT
2 3830 2018-04-09 2018-04-08 ER
Upvotes: 1
Reputation: 1281
Use iterrows. a series cannot give a single boolean value which if
expects.
for index,row in df_final.iterrows():
if (df_final['date_rslt'][index] < (df_final['date_admis'][index] + timedelta(days = + 3))) and (df_final['date_rslt'][index] > (df_final['date_admis'][index] + timedelta(days = - 1))):
department.append(df_final['dept'][index])
else:
department.append(np.NaN)
output:
[nan, 'INT', 'ER']
Upvotes: 1