Amit Modi
Amit Modi

Reputation: 222

What is difference between One Hot Encoding and pandas.categorical.code

I am working on some problem and have a doubt as below:

In the data set there is a text column with following unique values:

array(['1 bath', 'na', '1 shared bath', '1.5 baths', '1 private bath',
       '2 baths', '1.5 shared baths', '3 baths', 'Half-bath',
       '2 shared baths', '2.5 baths', '0 shared baths', '0 baths',
       '5 baths', 'Private half-bath', 'Shared half-bath', '4.5 baths',
       '5.5 baths', '2.5 shared baths', '3.5 baths', '15.5 baths',
       '6 baths', '4 baths', '3 shared baths', '4 shared baths',
       '3.5 shared baths', '6 shared baths', '6.5 shared baths',
       '6.5 baths', '4.5 shared baths', '7.5 baths', '5.5 shared baths',
       '7 baths', '8 shared baths', '5 shared baths', '8 baths',
       '10 baths', '7 shared baths'], dtype=object)

If I use Count Vectorize to convert them to one hot encoding,


vectorizer = CountVectorizer()
vectorizer.fit(X_train[colname].values) 

I am getting the below error:


AttributeError: 'float' object has no attribute 'lower'


Please let me know the cause of the error.

Instead of that Can I use :

pd.Categorical(_DF_LISTING_EDA.bathrooms_text).codes

What is the difference between One hot encoding and pd.categorical.code?

Thanks Amit Modi

Upvotes: 2

Views: 436

Answers (1)

Yefet
Yefet

Reputation: 2086

if you want One hot encoding using pandas you can do :

pandas.get_dummies(X_train[colname])[0]

Upvotes: 1

Related Questions