Why does Pandas get_dummies function not also perform a 'pivot'?

Question

I have a pandas dataframe that looks like this:

Customer      Product
   A           Table
   A           Chair
   A           Desk

and when I run the Pandas get_dummies function on Product, I get this:

Customer   Product_Table    Product_Chair    Product_Desk
   A             1                 0                0 
   A             0                 1                0
   A             0                 0                1

Is this correct in terms of pre-modeling? It would seem that I'm feeding it customer A information 3 different times. The first time I'm saying it only has Table and no chairs or desk, but in reality they have all three.

How does this affect the model? My gut tells me that when I do this type of conversion I should end up with only 1 line? Is that right? And if so, what did I do wrong, or need to add, in order to eliminate the 'duplicate' rows?

Below is the syntax I'm using:

# Create a list of features to dummy
todummy_list = []
for col_name in sdf.columns:
    if sdf[col_name].dtypes == 'object' and (col_name != 'Customer' ):
        todummy_list.append(col_name)
print(todummy_list)


# Function to dummy all the categorical variables used for modeling
def dummy_df(df, todummy_list):
    for x in todummy_list:
        dummies = pd.get_dummies(sdf[x], prefix=x, dummy_na=False)
        df = df.drop(x, 1)
        df = pd.concat([df, dummies], axis=1)
    return df

sdf = dummy_df(sdf, todummy_list)

print(sdf.head(5))

jpp · Accepted Answer

To eliminate the "duplicate rows", you can just use pd.crosstab:

res = pd.crosstab(df['Customer'], df['Product'])

print(res)

Product   Chair  Desk  Table
Customer                    
A             1     1      1

Why does Pandas get_dummies function not also perform a 'pivot'?

Answers (2)

Related Questions

Why does Pandas get_dummies function not also perform a &#39;pivot&#39;?

Answers (2)

Related Questions

Why does Pandas get_dummies function not also perform a 'pivot'?