Create Dummies for Multiple Columns on Unique Value in a Column

Question

I have a Dataframe as mentioned below, I have multiple categories for CTI and RESOLUTION and the goal is to create dummy variables for the CTI and RESOLUTION Categories and for the categories that do not have an entry for this specific account.

       ACCOUNT  |   CTI       |      RESOLUTION
        59737001    Data:HI         Customer Owned Issue / Customer Equipment
        59737001    Data:HI         Repaired / Replaced Drop Underground
        13847688    Data:OK         Not Repaired

My expected output is

    ACCOUNT  |  CTI_Data:HI | CTI_DATA:OK| RESOLUTION_Customer Owned... | RESOLUTION_Repaired/Repla.... | RESOLUTION_Not Repaired
     59737001      1         0                  1                          1                         0

I know pd.get_dummies() works for getting the dummies for multiple categories but my case is different. Any help is appreciated

scotscotmcc · Accepted Answer

I believe you can get this by using both pd.get_dummies() and df.groupby().any(). The groupby().any() will return TRUE/FALSE, and so you end that with converting to int

df2 = pd.get_dummies(df,columns=['CTI','RESOLUTION']) # df is what you have in your first example. Putting in the columns here restricts dummies to just those columns.
df2.groupby('ACCOUNT').any().astype(int)

Create Dummies for Multiple Columns on Unique Value in a Column

Answers (1)

Related Questions