Reputation: 189
I want to rename the categories and add the missing categories to a Series.
My code:
codedCol = bdAu['Bordersite']
print 'pre:'
print codedCol.head(10)
codedCol = codedCol.astype('category')
codedCol = codedCol.cat.set_categories(['a','b','c','d','e','f','g','h','i','j'])
print 'post:'
print codedCol.head(10)
When I do this I get NaN as the result.
pre:
0 3
1 3
2 2
3 2
4 3
5 4
6 5
7 3
8 3
9 3
Name: Bordersite, dtype: int64
post:
0 NaN
1 NaN
2 NaN
3 NaN
4 NaN
5 NaN
6 NaN
7 NaN
8 NaN
9 NaN
dtype: category
Categories (10, object): [a, b, c, d, ..., g, h, i, j]
What am I doing wrong here?
Thanks Kheeran
Upvotes: 1
Views: 10295
Reputation: 862771
First or creating catagories
you can use .astype('category')
, but categories
are added from your column or Categorical
with parameter categories
where are defined.
You can use:
codedCol = bdAu['Bordersite']
codedCol = pd.Series(pd.Categorical(codedCol, categories=[0,1,2,3,4,5,6,7,8,9]))
print (codedCol)
0 3
1 3
2 2
3 2
4 3
5 4
6 5
7 3
8 3
9 3
dtype: category
Categories (10, int64): [0, 1, 2, 3, ..., 6, 7, 8, 9]
And then rename_categories
, but number of items in categories have to be same, else error:
ValueError: new categories need to have the same number of items than the old categories!
codedCol = codedCol.cat.rename_categories(['a','b','c','d','e','f','g','h','i','j'])
print (codedCol)
0 d
1 d
2 c
3 c
4 d
5 e
6 f
7 d
8 d
9 d
dtype: category
Categories (10, object): [a, b, c, d, ..., g, h, i, j]
Upvotes: 2
Reputation: 1425
You've set the categories to the following: ['a','b','c','d','e','f','g','h','i','j']
. The current values in the column in codedCat
do not match any of the categories. Therefore, they get re-set to NaN
. For further reading, consider this example from the docs:
In [10]: raw_cat = pd.Categorical(["a","b","c","a"], categories=["b","c","d"],
....: ordered=False)
....:
In [11]: s = pd.Series(raw_cat)
In [12]: s
Out[12]:
0 NaN
1 b
2 c
3 NaN
dtype: category
Categories (3, object): [b, c, d]
Since "a"
is not a category, it gets re-set to NaN
.
Upvotes: 1