Reputation: 146
I am trying to get a return of all values that is between two other rows. I seem to constantly running into valueerrors. Any idea of how to solve this?
I have found one half-baked solution, but it is ...ugly...
My sample input looks like this:
| patient_id | delirium_sae | syncope_sae | tia_sae | start_monitoring | end_monitoring |
|------------|--------------|-------------|------------|------------------|----------------|
| 1 | 01-08-2020 | 29-07-2020 | | 30-07-2020 | 02-08-2020 |
| 1 | 03-08-2020 | | 05-08-2020 | 30-07-2020 | 02-08-2020 |
| 2 | | 02-08-2020 | | 01-08-2020 | 04-08-2020 |
| 2 | | 02-08-2020 | | 01-08-2020 | 04-08-2020 |
| 3 | | | 04-08-2020 | 02-08-2020 | 05-08-2020 |
| 3 | 31-07-2020 | 01-07-2020 | 20-07-2020 | 02-08-2020 | 05-08-2020 |
| 3 | 02-08-2020 | | 06-08-2020 | 02-08-2020 | 05-08-2020 |
My desired output would like this:
| patient_id | delirium_sae | syncope_sae | tia_sae |
|------------|--------------|-------------|------------|
| 1 | 01-08-2020 | | |
| 2 | | 02-08-2020 | |
| 2 | | 02-08-2020 | |
| 3 | | | 04-08-2020 |
| 3 | 02-08-2020 | | |
So far i do a for-loop that looks like this:
for i in range(len(df.columns)):
print ('--- ---',df.columns[i],'--- --- ---')
print (df[df[df.columns[i]].between(df[start_monitoring], df[end_monitoring])][c.columns[i]].count())
Which returns the total count for each, but if can get it back as a dataframe, then i could do more calculations with it.
Anyone know how to go about it? Any ideas why i end up with value-error if i try to do a mask like below?
df[(df[list_of_output_columns] >= df['start_monitoring']) &
(df[list_of_output_columns] <= df['end_monitoring'])]
Upvotes: 0
Views: 60
Reputation: 22493
I think you need to apply
it:
df = df.set_index("patient_id").apply(lambda d: pd.to_datetime(d, format="%d-%m-%Y"))
mask = df.filter(like="_sae").apply(lambda d: (df["start_monitoring"]<=d)&(d<=df["end_monitoring"]))
print (df[mask].filter(like="_sae").dropna(how="all"))
delirium_sae syncope_sae tia_sae
patient_id
1 2020-08-01 NaT NaT
2 NaT 2020-08-02 NaT
2 NaT 2020-08-02 NaT
3 NaT NaT 2020-08-04
3 2020-08-02 NaT NaT
Upvotes: 1