Jesper Mølgaard
Jesper Mølgaard

Reputation: 146

Dataframe lookup values based on condition using dictionary

I am trying to lookup specific values in some columns of a dataframe, based on the content of other columns. Basically, if a row in column_1 == 1 then we want the data in the same row,column_2. I have quite a bit of values to lookup, so i have stored them in a dictionary, as this seems an intuitive way to approach the problem, but i can't get my functions to stick. I am not great with functions yet...

Example input:

| patient_id | delirium_date | delirium_sae | syncope_date | syncope_sae | tia_date   | tia_sae |
|------------|---------------|--------------|--------------|-------------|------------|---------|
| 1          | 01-08-2020    | 1            |              |             |            |         |
| 1          | 03-08-2020    | 0            |              |             |            |         |
| 2          |               |              | 02-08-2020   | 1           |            |         |
| 2          |               |              | 02-08-2020   | 1           |            |         |
| 3          |               |              |              |             | 04-08-2020 | 1       |
| 3          | 31-07-2020    | 0            | 01-07-2020   | 0           | 20-07-2020 | 0       |
| 3          | 02-08-2020    | 1            |              |             |            |         |

    lookup_dictionary = {'delirium_sae' : 'delirium_date',
                         'syncope_sae'  : 'syncope_date',
                         'tia_sae'      : 'tia_date'}

So if any of the column(dict(keys)) are set to '1' then get column(dict(values)) Desired output:

| patient_id | delirium_date | syncope_date | tia_date   |
|------------|---------------|--------------|------------|
| 1          | 01-08-2020    |              |            |
| 1          |               |              |            |
| 2          |               | 02-08-2020   |            |
| 2          |               | 02-08-2020   |            |
| 3          |               |              | 04-08-2020 |
| 3          |               |              |            |
| 3          | 02-08-2020    |              |            |

NB: It does not need to preserve empty patient_id rows, or empty columns.

As i said, i am not good at using functions, so if someone could walk me slowly through i should think when constructing one, then I can hopefully make my own next time...

Upvotes: 1

Views: 903

Answers (2)

jezrael
jezrael

Reputation: 862521

For correct working is necessary matching all keys and values of dictionary.

Idea is select all columns by keys of dictionary, compare by 1 and rename columns for match columns selected by values of dictonary in DataFrame.where:

lookup_dictionary = {'delirium_sae' : 'delirium_date',
                         'syncope_sae'  : 'syncope_date',
                         'tia_sae'      : 'tia_date'}
 

mask = df[list(lookup_dictionary.keys())].eq(1).rename(columns=lookup_dictionary)
print (mask)
   delirium_date  syncope_date  tia_date
0           True         False     False
1          False         False     False
2          False          True     False
3          False          True     False
4          False         False      True
5          False         False     False
6           True         False     False


df = df[['patient_id']].join(df[list(lookup_dictionary.values())].where(mask))
print (df)
   patient_id delirium_date syncope_date    tia_date
0           1    01-08-2020          NaN         NaN
1           1           NaN          NaN         NaN
2           2           NaN   02-08-2020         NaN
3           2           NaN   02-08-2020         NaN
4           3           NaN          NaN  04-08-2020
5           3           NaN          NaN         NaN
6           3    02-08-2020          NaN         NaN

Upvotes: 1

Shubham Sharma
Shubham Sharma

Reputation: 71689

Use DataFrame.mask to mask the columns associated with dictionary values based on the columns associated with dictionary keys:

keys, vals = dct.keys(), dct.values()
df1 = df[['patient_id']].join(df[vals].mask(df[keys].eq(0).to_numpy()))

Result:

print(df1)
   patient_id delirium_date syncope_date    tia_date
0           1    01-08-2020          NaN         NaN
1           1           NaN          NaN         NaN
2           2           NaN   02-08-2020         NaN
3           2           NaN   02-08-2020         NaN
4           3           NaN          NaN  04-08-2020
5           3           NaN          NaN         NaN
6           3    02-08-2020          NaN         NaN

Upvotes: 1

Related Questions