KubiK888
KubiK888

Reputation: 4723

Using apply function in Pandas while referencing and looping through another df?

I have a drug reference as a df called df_drug_ref (below). There are three drugs (A, B, and C). The corresponding ATC is listed in the second column. However, if patient has a DIN within the Drug_BIN_Id_Exclusion list, then s/he would not be considered as using that drug (ie. 011235 for Drug A).

Drug        Drug_ATC_Id         Drug_DIN_Id_Exclusion
A           N123                [011235]
B           B5234               [65413, 654351]
C           N32456              []

The following is the other df called df_row. This captures all the drugs dispensed by each individual. And each individual has his own People_Id.

People_Id   Drug_ATC            Drug_DIN                A           B           C
1001        N123                                        
1001        N123                011235                  
1001        N32456              011232                  
1001        N111                                        
1002        B5234               65413                       
1002        B5234               654090                  
1002        N123                011235                  

I would like to assign '1' for the corresponding drug (looping iteratively to check for A, B, or C and assigning to the corresponding columns) if, in that row, the ATC code matches with the drug reference and the DIN is not contained within the exclusion list. The result should be:

People_Id   Drug_ATC            Drug_DIN                A           B           C
1001        N123                                        1           0           0
1001        N123                011235                  0           0           0
1001        N32456              011232                  0           0           1
1001        N111                                        0           0           0
1002        B5234               65413                   0           0           0       
1002        B5234               654090                  0           1           0
1002        N123                011235                  0           0           0

I understand how to use apply function within the same df itself, but I don't know how to also use an external df as reference.

Upvotes: 1

Views: 49

Answers (2)

KubiK888
KubiK888

Reputation: 4723

This is a working solution using function and iterrows:

def check_rx_condition(row):
    for index, col in df_drug_ref.iterrows():
        if ((col['Drug_ATC_Id'] in row['Drug_ATC'])&
            (row['DRUG_DIN'] not in col['Drug_DIN_Id_Exclusion'])):
            row[col['Drug']] = 1
        else:
            row[col['Drug']] = 0
    return row

df_row = df_row.apply(check_rx_condition, axis=1)

Upvotes: 1

Ben.T
Ben.T

Reputation: 29635

First you can split your lists to several columns with apply(pd.Series) and join them to df_drug_ref:

print (df_drug_ref.join(df_drug_ref['Drug_DIN_Id_Exclusion'].apply(pd.Series)))
  Drug Drug_ATC_Id Drug_DIN_Id_Exclusion       0       1
0    A        N123              [011235]  011235     NaN
1    B       B5234       [65413, 654351]   65413  654351
2    C      N32456                    []     NaN     NaN

Then you can merge on the column 'Drug_ATC' the above joined dataframe to People_Id, after doing some cleaning on columns:

df_merge = People_Id.merge(df_drug_ref[['Drug', 'Drug_ATC_Id']]
                                        .join(df_drug_ref['Drug_DIN_Id_Exclusion']
                                                         .apply(pd.Series)
                                                         .add_prefix('Drug_DIN_'))
                                      .rename(columns={'Drug_ATC_Id':'Drug_ATC'}),
                           how='left')

to get df_merge:

   People_Id Drug_ATC Drug_DIN Drug Drug_DIN_0 Drug_DIN_1
0       1001     N123             A     011235        NaN
1       1001     N123   011235    A     011235        NaN
2       1001   N32456   011235    C        NaN        NaN
3       1001     N111           NaN        NaN        NaN
4       1002    B5234    65413    B      65413     654351
5       1002    B5234   654090    B      65413     654351
6       1002     N123   011235    A     011235        NaN

Now you can replace the column 'Drug' with NaN where the value in 'Drug_DIN' is in one of the columns 'Drug_DIN_i' with np.any:

mask = np.any(df_merge.filter(like='Drug_DIN').iloc[:,:1].values == 
              df_merge.filter(like='Drug_DIN').iloc[:,1:].values, axis=1)
df_merge.loc[mask,'Drug'] = np.nan

Finally, to create the columns A, B, C ... you can use pd.get_dummies with set_index and then reset_index:

new_People_Id = pd.get_dummies(df_merge.set_index(['People_Id','Drug_ATC','Drug_DIN'])['Drug']).reset_index()
print (new_People_Id)
   People_Id Drug_ATC Drug_DIN  A  B  C
0       1001     N123           1  0  0
1       1001     N123   011235  0  0  0
2       1001   N32456   011235  0  0  1
3       1001     N111           0  0  0
4       1002    B5234    65413  0  0  0
5       1002    B5234   654090  0  1  0
6       1002     N123   011235  0  0  0

Note here you can also use join such as:

new_People_Id = df_merge[['People_Id','Drug_ATC','Drug_DIN']].join(df_merge['Drug'].str.get_dummies())

maybe faster.

Upvotes: 1

Related Questions