user2663139
user2663139

Reputation: 189

rename the categories and add the missing categories to a Series PANDAS

I want to rename the categories and add the missing categories to a Series.

My code:

codedCol = bdAu['Bordersite']
print 'pre:'
print codedCol.head(10)
codedCol = codedCol.astype('category')
codedCol = codedCol.cat.set_categories(['a','b','c','d','e','f','g','h','i','j'])
print 'post:'
print codedCol.head(10)

When I do this I get NaN as the result.

pre:
0    3
1    3
2    2
3    2
4    3
5    4
6    5
7    3
8    3
9    3
Name: Bordersite, dtype: int64
post:
0    NaN
1    NaN
2    NaN
3    NaN
4    NaN
5    NaN
6    NaN
7    NaN
8    NaN
9    NaN
dtype: category
Categories (10, object): [a, b, c, d, ..., g, h, i, j]

What am I doing wrong here?

Thanks Kheeran

Upvotes: 1

Views: 10295

Answers (3)

Stuart Hallows
Stuart Hallows

Reputation: 8953

Use add_categories to add_categories to a series.

Upvotes: 2

jezrael
jezrael

Reputation: 862771

First or creating catagories you can use .astype('category'), but categories are added from your column or Categorical with parameter categories where are defined.

You can use:

codedCol = bdAu['Bordersite']
codedCol = pd.Series(pd.Categorical(codedCol, categories=[0,1,2,3,4,5,6,7,8,9]))
print (codedCol)
0    3
1    3
2    2
3    2
4    3
5    4
6    5
7    3
8    3
9    3
dtype: category
Categories (10, int64): [0, 1, 2, 3, ..., 6, 7, 8, 9]

And then rename_categories, but number of items in categories have to be same, else error:

ValueError: new categories need to have the same number of items than the old categories!

codedCol = codedCol.cat.rename_categories(['a','b','c','d','e','f','g','h','i','j'])
print (codedCol)
0    d
1    d
2    c
3    c
4    d
5    e
6    f
7    d
8    d
9    d
dtype: category
Categories (10, object): [a, b, c, d, ..., g, h, i, j]

Upvotes: 2

Jossie Calderon
Jossie Calderon

Reputation: 1425

You've set the categories to the following: ['a','b','c','d','e','f','g','h','i','j']. The current values in the column in codedCat do not match any of the categories. Therefore, they get re-set to NaN. For further reading, consider this example from the docs:

In [10]: raw_cat = pd.Categorical(["a","b","c","a"], categories=["b","c","d"],
   ....:                          ordered=False)
   ....: 
In [11]: s = pd.Series(raw_cat)

In [12]: s
Out[12]: 
0    NaN
1      b
2      c
3    NaN
dtype: category
Categories (3, object): [b, c, d]

Since "a" is not a category, it gets re-set to NaN.

Upvotes: 1

Related Questions