Guga
Guga

Reputation: 349

How to Hot Encode with Pandas without combining rows levels

I have created a really big dataframe in pandas like similar to the following:

    0   1
user        
0   product4    product0
1   product3    product1

I want to use something, like pd.get_dummies(), in such a way that the final df would be like:

    product0    product1    product2    product3  product4
user                
0   1   0   0   0   1
1   0   1   0   1   0

instead of getting the following from pd.get_dummies():

    0_product3  0_product4  1_product0  1_product1
user                
0   0   1   1   0
1   1   0   0   1

In summary, I do not want that the rows are combined into the binary columns. Thanks a lot!

Upvotes: 2

Views: 44

Answers (2)

Kariru
Kariru

Reputation: 111

df = pd.get_dummies(df, prefix='', prefix_sep='') # remove prefix from dummy column names and underscore
df = df.sort_index(axis=1) # order data by column names

Upvotes: 1

Zero
Zero

Reputation: 76917

Use reindex with get_dummies

In [539]: dff = pd.get_dummies(df, prefix='', prefix_sep='')

In [540]: s = dff.columns.str[-1].astype(int)

In [541]: cols = 'product' + pd.RangeIndex(s.min(), s.max()+1).astype(str)

In [542]: dff.reindex(columns=cols, fill_value=0)
Out[542]:
      product0  product1  product2  product3  product4
user
0            1         0         0         0         1
1            0         1         0         1         0

Upvotes: 2

Related Questions