What is difference between One Hot Encoding and pandas.categorical.code

Question

I am working on some problem and have a doubt as below:

In the data set there is a text column with following unique values:

array(['1 bath', 'na', '1 shared bath', '1.5 baths', '1 private bath',
       '2 baths', '1.5 shared baths', '3 baths', 'Half-bath',
       '2 shared baths', '2.5 baths', '0 shared baths', '0 baths',
       '5 baths', 'Private half-bath', 'Shared half-bath', '4.5 baths',
       '5.5 baths', '2.5 shared baths', '3.5 baths', '15.5 baths',
       '6 baths', '4 baths', '3 shared baths', '4 shared baths',
       '3.5 shared baths', '6 shared baths', '6.5 shared baths',
       '6.5 baths', '4.5 shared baths', '7.5 baths', '5.5 shared baths',
       '7 baths', '8 shared baths', '5 shared baths', '8 baths',
       '10 baths', '7 shared baths'], dtype=object)

If I use Count Vectorize to convert them to one hot encoding,

vectorizer = CountVectorizer()
vectorizer.fit(X_train[colname].values)

I am getting the below error:

AttributeError: 'float' object has no attribute 'lower'

Please let me know the cause of the error.

Instead of that Can I use :

pd.Categorical(_DF_LISTING_EDA.bathrooms_text).codes

What is the difference between One hot encoding and pd.categorical.code?

Thanks Amit Modi

Yefet · Accepted Answer

if you want One hot encoding using pandas you can do :

pandas.get_dummies(X_train[colname])[0]

What is difference between One Hot Encoding and pandas.categorical.code

Answers (1)

Related Questions