Ranjini
Ranjini

Reputation: 1

How to hot encode a dataframe column with multiple strings?

I am currently working on building a regressor model to predict the food delivery time.

This is the dataframe with a few observation

1

If you observe the Cuisines column has many strings. Used the code

pd.get_dummies(data.Cuisines.str.split(',',expand=True),prefix='c')

This helped me split the strings and hot encode, however, there is a new issue to be dealt with.

Merged the dataframe and dummies. fastfood appears in 1st and 3rd rows. Expected output was a single fastfood column with value 1 on first and third rows, however, there are two fastfood columns are created. fastfood(4th column) is created for first row and fastfood(15th column) for thrid row.

2

Can someone help me solve this help me get a single fastfood column with value 1 on first and third rows and similarly for the other cuisines too.

Upvotes: 0

Views: 64

Answers (1)

Quang Hoang
Quang Hoang

Reputation: 150725

The two Fast Food are different by a trailing space. You probably want to try:

data.Cuisines.str.get_dummies(',\s*')

Upvotes: 1

Related Questions