user308827
user308827

Reputation: 21971

adding up columns in pandas dataframe results in categorical index error

I have the foll. dataframe:

ps_yd_1            $0^{th} - 25^{th}$  $25^{th} - 50^{th}$  \
ps_variable_1                                                   
$0^{th} - 25^{th}$             47.566800            23.441332   
$25^{th} - 50^{th}$            32.764905            40.947438   
$50^{th} - 75^{th}$            10.830286            21.435877   
$75^{th} - 100^{th}$           14.388537            33.796734   
ps_yd_1            $50^{th} - 75^{th}$  $75^{th} - 100^{th}$  
ps_variable_1                                                    
$0^{th} - 25^{th}$              21.237253              7.754615  
$25^{th} - 50^{th}$              8.634613             17.653044  
$50^{th} - 75^{th}$             14.684188             53.049650  
$75^{th} - 100^{th}$            13.072976             38.741753  

I want to add 2 columns to create a new one:

df_hmp['a'] = df_hmp['$0^{th} - 25^{th}$'] + df_hmp['$25^{th} - 50^{th}$']

but I get this error:

*** TypeError: cannot insert an item into a CategoricalIndex that is not already an existing category

This is what the index looks like:

CategoricalIndex(['$0^{th} - 25^{th}$', '$25^{th} - 50^{th}$',
                  '$50^{th} - 75^{th}$', '$75^{th} - 100^{th}$'],
                 categories=['$0^{th} - 25^{th}$', '$25^{th} - 50^{th}$', '$50^{th} - 75^{th}$', '$75^{th} - 100^{th}$'], ordered=True, name='ps_variable_1', dtype='category')

How to fix it?

Upvotes: 2

Views: 805

Answers (1)

DYZ
DYZ

Reputation: 57033

All columns and rows in your dataframe have categorical indexes. If you want to add another column, you must first add another value to the categorical index.

Let's first recreate you dataframe:

df_hmp = pd.DataFrame([[47.566800 ,32.764905,10.830286,14.388537],
                 [23.441332,40.947438,21.435877,33.796734],
                 [21.237253,8.634613,14.684188,13.072976],
                 [7.75461,17.653044,53.049650,38.741753]]).T

idx = pd.CategoricalIndex(['$0^{th} - 25^{th}$', '$25^{th} - 50^{th}$',
               '$50^{th} - 75^{th}$', '$75^{th} - 100^{th}$'],
                categories=['$0^{th} - 25^{th}$', '$25^{th} - 50^{th}$',  
               '$50^{th} - 75^{th}$', '$75^{th} - 100^{th}$'], 
               ordered=True, name='ps_variable_1', dtype='category')
df_hmp.columns = idx
df_hmp.index = idx.copy()
df_hmp.columns.name = 'ps_yd_1'

Now, manipulate the categorical variable:

df_hmp.columns = df_hmp.columns.add_categories('a')
df_hmp['a'] = df_hmp['$0^{th} - 25^{th}$'] + df_hmp['$25^{th} - 50^{th}$']
# Works like charm

Upvotes: 3

Related Questions