Reputation: 222
I am working on some problem and have a doubt as below:
In the data set there is a text column with following unique values:
array(['1 bath', 'na', '1 shared bath', '1.5 baths', '1 private bath',
'2 baths', '1.5 shared baths', '3 baths', 'Half-bath',
'2 shared baths', '2.5 baths', '0 shared baths', '0 baths',
'5 baths', 'Private half-bath', 'Shared half-bath', '4.5 baths',
'5.5 baths', '2.5 shared baths', '3.5 baths', '15.5 baths',
'6 baths', '4 baths', '3 shared baths', '4 shared baths',
'3.5 shared baths', '6 shared baths', '6.5 shared baths',
'6.5 baths', '4.5 shared baths', '7.5 baths', '5.5 shared baths',
'7 baths', '8 shared baths', '5 shared baths', '8 baths',
'10 baths', '7 shared baths'], dtype=object)
If I use Count Vectorize to convert them to one hot encoding,
vectorizer = CountVectorizer()
vectorizer.fit(X_train[colname].values)
I am getting the below error:
AttributeError: 'float' object has no attribute 'lower'
Please let me know the cause of the error.
Instead of that Can I use :
pd.Categorical(_DF_LISTING_EDA.bathrooms_text).codes
What is the difference between One hot encoding and pd.categorical.code?
Thanks Amit Modi
Upvotes: 2
Views: 436
Reputation: 2086
if you want One hot encoding using pandas you can do :
pandas.get_dummies(X_train[colname])[0]
Upvotes: 1