How to convert columns with multiple values into multiple columns with binary values?

Question

I am working with a medical database, and I have a dataframe that looks like this:

      INC_KEY   COMORBID1    COMORBID2    COMORBID3  COMORBID4
0   160389417   Hypertension None         None       None
1   160789043   COPD         Hypertension Diabetes   None
2   160039662   Hypertension ADLC         Other      None
3   160367584   Diabetes     None         None       None
4   160008818   None         None         None       None

As you can see, there are multiple columns for comorbidities that are labeled numerically, each of which can have several different values.

I need to make it such that the column names are the comorbidities, and the values are either a 0 for no and a 1 for yes.

Example:

      INC_KEY  HYPERTENSION COPD ADLC DIABETES
 0  160389417             1    0    0        0
 1  160789043             1    1    0        1
 2  160039662             1    0    1        0
 3  160367584             0    0    0        1
 4  160008818             0    0    0        0

I've given a simplified version, but there are 24 different possible comorbidities with which I need to do this.

I have tried pd.get_dummies(), however it does not work the way I need it to. The get_dummies function creates individual columns for each unique value for EACH COMORBID1-COMORBID24. So instead of 24 new columns, I end up with 24*24=576 new columns.

So with get_dummies the new column names would be:

COMORBID1_HYPERTENSION, COMORBID1_COPD, COMORBID1_ADLC, COMORBID1_DIABETES, COMORBID2_HYPERTENSION, COMOBID2_COPD, COMORBID2_ADLC, COMORBID2_DIABETES...

and so on all the way through 24.

What is the best way to do what I am trying to do?

Thank you in advance to anyone who helps :)

How to convert columns with multiple values into multiple columns with binary values?

Answers (1)

Related Questions