Reputation: 3477
I was reading about One Hot Encoding in Python and there is a line that I cannot explain what it means. The code is the following:
ohe=preprocessing.OneHotEncoder(dtype=np.int,sparse=True,handle_unknown="ignore")
data=ohe.fit_transform(df[["country"]])
The thing is when I print the values of categories like this:
print (ohe.categories_)
It prints [array(['EEUU', 'France', 'Portugal', 'Italy'], dtype=object)]
but when I do this:
print (ohe.categories_[0])
['EEUU', 'France', 'Portugal', 'Italy']
I was not able to found what does that [0] does, it seems that it converts from an array to a list, but why not use something like the tolist() function?
I have searched on the web, but I was not able to find an explanation about this expression, any help?
Thanks
Upvotes: 1
Views: 274
Reputation: 59274
[array(['EEUU', 'France', 'Portugal', 'Italy'], dtype=object)]
is a list with one object. This object is a numpy array. When you do ohe.categories_[0]
, you access the first item of this list - which happens to be the only item in the list.
ohe.categories_
returns a list because it returns a different numpy
array for each column in your input. Since df[["country"]]
has only one column, it returns a list with only one object.
If you did df[["country", "second_column"]]
, for instance, you'd get a list with two arrays, stating categories for each.
Upvotes: 1