How to un-categoricalize a column in pandas

Question

I have a Dataframe in Pandas. For sorting purposes, one of the columns is created with:

 df['segVar'] = df['segVar'].astype('category', categories=segVars, ordered=True)

in normal operation, it's saved to a csv with to_csv and then read in in a later stage. In this mode, once it's read in, segVar is not a category. this is fine, and the functionality I want.

For unit testing purposes, however, I'm doing all of this without saving it to a file, and so the segVar column is still a category. This breaks the code, because I do things like df['segVar'].unique() which doesn't work on categoricals.

Basically, I want to not change the column, except that instead of being a categorical, I want it to be the original values.

cs95 · Accepted Answer

If you're starting with something like this -

0    a
1    b
2    c
3    a
4    c
5    c
6    b
dtype: category
Categories (3, object): [a < b < c]

Then s.unique() works on v0.22 for me for categorical columns -

s.unique()

[a, b, c]
Categories (3, object): [a < b < c]

This is a pandas.core.categorical.Categorical object.

Alternatively,

s.unique().tolist()
['a', 'b', 'c']

Alternatively, if that doesn't work, you can just convert to an str column, you're essentially getting the same thing in the end.

s.astype(str).unique()
array(['a', 'b', 'c'], dtype=object)

In this case, you receive an array as your result.

How to un-categoricalize a column in pandas

Answers (1)

Related Questions