Reputation: 146
I am trying to lookup specific values in some columns of a dataframe, based on the content of other columns. Basically, if a row in column_1 == 1 then we want the data in the same row,column_2. I have quite a bit of values to lookup, so i have stored them in a dictionary, as this seems an intuitive way to approach the problem, but i can't get my functions to stick. I am not great with functions yet...
Example input:
| patient_id | delirium_date | delirium_sae | syncope_date | syncope_sae | tia_date | tia_sae |
|------------|---------------|--------------|--------------|-------------|------------|---------|
| 1 | 01-08-2020 | 1 | | | | |
| 1 | 03-08-2020 | 0 | | | | |
| 2 | | | 02-08-2020 | 1 | | |
| 2 | | | 02-08-2020 | 1 | | |
| 3 | | | | | 04-08-2020 | 1 |
| 3 | 31-07-2020 | 0 | 01-07-2020 | 0 | 20-07-2020 | 0 |
| 3 | 02-08-2020 | 1 | | | | |
lookup_dictionary = {'delirium_sae' : 'delirium_date',
'syncope_sae' : 'syncope_date',
'tia_sae' : 'tia_date'}
So if any of the column(dict(keys)) are set to '1' then get column(dict(values)) Desired output:
| patient_id | delirium_date | syncope_date | tia_date |
|------------|---------------|--------------|------------|
| 1 | 01-08-2020 | | |
| 1 | | | |
| 2 | | 02-08-2020 | |
| 2 | | 02-08-2020 | |
| 3 | | | 04-08-2020 |
| 3 | | | |
| 3 | 02-08-2020 | | |
NB: It does not need to preserve empty patient_id rows, or empty columns.
As i said, i am not good at using functions, so if someone could walk me slowly through i should think when constructing one, then I can hopefully make my own next time...
Upvotes: 1
Views: 903
Reputation: 862521
For correct working is necessary matching all keys and values of dictionary.
Idea is select all columns by keys of dictionary, compare by 1
and rename columns for match columns selected by values of dictonary in DataFrame.where
:
lookup_dictionary = {'delirium_sae' : 'delirium_date',
'syncope_sae' : 'syncope_date',
'tia_sae' : 'tia_date'}
mask = df[list(lookup_dictionary.keys())].eq(1).rename(columns=lookup_dictionary)
print (mask)
delirium_date syncope_date tia_date
0 True False False
1 False False False
2 False True False
3 False True False
4 False False True
5 False False False
6 True False False
df = df[['patient_id']].join(df[list(lookup_dictionary.values())].where(mask))
print (df)
patient_id delirium_date syncope_date tia_date
0 1 01-08-2020 NaN NaN
1 1 NaN NaN NaN
2 2 NaN 02-08-2020 NaN
3 2 NaN 02-08-2020 NaN
4 3 NaN NaN 04-08-2020
5 3 NaN NaN NaN
6 3 02-08-2020 NaN NaN
Upvotes: 1
Reputation: 71689
Use DataFrame.mask
to mask the columns associated with dictionary values based on the columns associated with dictionary keys:
keys, vals = dct.keys(), dct.values()
df1 = df[['patient_id']].join(df[vals].mask(df[keys].eq(0).to_numpy()))
Result:
print(df1)
patient_id delirium_date syncope_date tia_date
0 1 01-08-2020 NaN NaN
1 1 NaN NaN NaN
2 2 NaN 02-08-2020 NaN
3 2 NaN 02-08-2020 NaN
4 3 NaN NaN 04-08-2020
5 3 NaN NaN NaN
6 3 02-08-2020 NaN NaN
Upvotes: 1