Little
Little

Reputation: 3477

Python Pandas OneHotEncoder categories

I was reading about One Hot Encoding in Python and there is a line that I cannot explain what it means. The code is the following:

ohe=preprocessing.OneHotEncoder(dtype=np.int,sparse=True,handle_unknown="ignore")
data=ohe.fit_transform(df[["country"]])

The thing is when I print the values of categories like this:

print (ohe.categories_)

It prints [array(['EEUU', 'France', 'Portugal', 'Italy'], dtype=object)]

but when I do this:

print (ohe.categories_[0])

['EEUU', 'France', 'Portugal', 'Italy']

I was not able to found what does that [0] does, it seems that it converts from an array to a list, but why not use something like the tolist() function?

I have searched on the web, but I was not able to find an explanation about this expression, any help?

Thanks

Upvotes: 1

Views: 274

Answers (1)

rafaelc
rafaelc

Reputation: 59274

[array(['EEUU', 'France', 'Portugal', 'Italy'], dtype=object)] is a list with one object. This object is a numpy array. When you do ohe.categories_[0], you access the first item of this list - which happens to be the only item in the list.

ohe.categories_ returns a list because it returns a different numpy array for each column in your input. Since df[["country"]] has only one column, it returns a list with only one object.

If you did df[["country", "second_column"]], for instance, you'd get a list with two arrays, stating categories for each.

Upvotes: 1

Related Questions