Jesper Mølgaard
Jesper Mølgaard

Reputation: 146

Check if several rows of datetime is between other two values

I am trying to get a return of all values that is between two other rows. I seem to constantly running into valueerrors. Any idea of how to solve this?

I have found one half-baked solution, but it is ...ugly...

My sample input looks like this:

| patient_id | delirium_sae | syncope_sae | tia_sae    | start_monitoring | end_monitoring |
|------------|--------------|-------------|------------|------------------|----------------|
| 1          | 01-08-2020   | 29-07-2020  |            | 30-07-2020       | 02-08-2020     |
| 1          | 03-08-2020   |             | 05-08-2020 | 30-07-2020       | 02-08-2020     |
| 2          |              | 02-08-2020  |            | 01-08-2020       | 04-08-2020     |
| 2          |              | 02-08-2020  |            | 01-08-2020       | 04-08-2020     |
| 3          |              |             | 04-08-2020 | 02-08-2020       | 05-08-2020     |
| 3          | 31-07-2020   | 01-07-2020  | 20-07-2020 | 02-08-2020       | 05-08-2020     |
| 3          | 02-08-2020   |             | 06-08-2020 | 02-08-2020       | 05-08-2020     |

My desired output would like this:

| patient_id | delirium_sae | syncope_sae | tia_sae    |
|------------|--------------|-------------|------------|
| 1          | 01-08-2020   |             |            |
| 2          |              | 02-08-2020  |            |
| 2          |              | 02-08-2020  |            |
| 3          |              |             | 04-08-2020 |
| 3          | 02-08-2020   |             |            |

So far i do a for-loop that looks like this:

for i in range(len(df.columns)):
    print ('--- ---',df.columns[i],'--- --- ---')
    print (df[df[df.columns[i]].between(df[start_monitoring], df[end_monitoring])][c.columns[i]].count())

Which returns the total count for each, but if can get it back as a dataframe, then i could do more calculations with it.

Anyone know how to go about it? Any ideas why i end up with value-error if i try to do a mask like below?

df[(df[list_of_output_columns] >= df['start_monitoring']) &
   (df[list_of_output_columns] <= df['end_monitoring'])]

Upvotes: 0

Views: 60

Answers (1)

Henry Yik
Henry Yik

Reputation: 22493

I think you need to apply it:

df = df.set_index("patient_id").apply(lambda d: pd.to_datetime(d, format="%d-%m-%Y"))

mask = df.filter(like="_sae").apply(lambda d: (df["start_monitoring"]<=d)&(d<=df["end_monitoring"]))

print (df[mask].filter(like="_sae").dropna(how="all"))

           delirium_sae syncope_sae    tia_sae
patient_id                                    
1            2020-08-01         NaT        NaT
2                   NaT  2020-08-02        NaT
2                   NaT  2020-08-02        NaT
3                   NaT         NaT 2020-08-04
3            2020-08-02         NaT        NaT

Upvotes: 1

Related Questions