Reputation: 135
I have a Dataframe as mentioned below, I have multiple categories for CTI and RESOLUTION and the goal is to create dummy variables for the CTI and RESOLUTION Categories and for the categories that do not have an entry for this specific account.
ACCOUNT | CTI | RESOLUTION
59737001 Data:HI Customer Owned Issue / Customer Equipment
59737001 Data:HI Repaired / Replaced Drop Underground
13847688 Data:OK Not Repaired
My expected output is
ACCOUNT | CTI_Data:HI | CTI_DATA:OK| RESOLUTION_Customer Owned... | RESOLUTION_Repaired/Repla.... | RESOLUTION_Not Repaired
59737001 1 0 1 1 0
I know pd.get_dummies()
works for getting the dummies for multiple categories but my case is different.
Any help is appreciated
Upvotes: 0
Views: 708
Reputation: 3113
I believe you can get this by using both pd.get_dummies()
and df.groupby().any()
. The groupby().any()
will return TRUE/FALSE, and so you end that with converting to int
df2 = pd.get_dummies(df,columns=['CTI','RESOLUTION']) # df is what you have in your first example. Putting in the columns here restricts dummies to just those columns.
df2.groupby('ACCOUNT').any().astype(int)
Upvotes: 1