Reputation: 12187
I have a Dataframe in Pandas. For sorting purposes, one of the columns is created with:
df['segVar'] = df['segVar'].astype('category', categories=segVars, ordered=True)
in normal operation, it's saved to a csv with to_csv
and then read in in a later stage. In this mode, once it's read in, segVar
is not a category. this is fine, and the functionality I want.
For unit testing purposes, however, I'm doing all of this without saving it to a file, and so the segVar
column is still a category. This breaks the code, because I do things like df['segVar'].unique()
which doesn't work on categoricals.
Basically, I want to not change the column, except that instead of being a categorical, I want it to be the original values.
Upvotes: 1
Views: 246
Reputation: 402814
If you're starting with something like this -
0 a
1 b
2 c
3 a
4 c
5 c
6 b
dtype: category
Categories (3, object): [a < b < c]
Then s.unique()
works on v0.22
for me for categorical columns -
s.unique()
[a, b, c]
Categories (3, object): [a < b < c]
This is a pandas.core.categorical.Categorical
object.
Alternatively,
s.unique().tolist()
['a', 'b', 'c']
Alternatively, if that doesn't work, you can just convert to an str
column, you're essentially getting the same thing in the end.
s.astype(str).unique()
array(['a', 'b', 'c'], dtype=object)
In this case, you receive an array as your result.
Upvotes: 2