DocZerø
DocZerø

Reputation: 8567

Error when using Pandas pivot_table with margins=True

My code (from the book Python Data Science Handbook (O'Reilly)):

Full disclosure: at the time of writing, the book is still in early release, meaning that it's still unedited and in its raw form.

import numpy as np
import pandas as pd
import seaborn as sns
titanic = sns.load_dataset('titanic')

titanic.pivot_table('survived', index='sex', columns='class')

The result is:

Dataframe

However, if I now try to add totals using the margins keyword, the following error occurs:

titanic.pivot_table('survived', index='sex', columns='class', margins=True)

TypeError: cannot insert an item into a CategoricalIndex that is not already an existing category

Any idea what could be causing this?

Version info:

Upvotes: 2

Views: 2505

Answers (2)

Craig Slocombe
Craig Slocombe

Reputation: 1

pivot_tables creates a new DataFrame therefore you need a new variable new_var = titanic.pivot_table(...)

(..., margins = True) is the mean (average) of each column and row (bool or int)

just del (..., margins = True) and use .aggfunc = sum)

Upvotes: 0

jakevdp
jakevdp

Reputation: 86433

This appears to be due to a change between pandas 0.15 and 0.16. In previous versions, the titanic dataset has this dtype:

In [1]: import pandas, seaborn

In [2]: pandas.__version__
'0.15.2'

In [3]: titanic = seaborn.load_dataset('titanic')

In [4]: titanic.dtypes
Out[4]: 
survived         int64
pclass           int64
sex             object
age            float64
sibsp            int64
parch            int64
fare           float64
embarked        object
class           object
who             object
adult_male        bool
deck            object
embark_town     object
alive           object
alone             bool
dtype: object

With the newer pandas:

In [1]: import pandas, seaborn

In [2]: pandas.__version__
'0.16.2'

In [3]: titanic = seaborn.load_dataset('titanic')

In [4]: titanic.dtypes
Out[4]: 
survived          int64
pclass            int64
sex              object
age             float64
sibsp             int64
parch             int64
fare            float64
embarked         object
class          category
who              object
adult_male         bool
deck           category
embark_town      object
alive            object
alone              bool
dtype: object

Several columns are automatically converted to categorical, which brings up this bug. The book is currently unpublished and unedited; I'll be sure to test with recent releases and fix these types of errors before publication.

For now, here is a workaround:

In [5]: titanic['class'] = titanic['class'].astype(object)

In [6]: titanic.pivot_table('survived', index='sex', columns='class', margins=True)
Out[6]: 
class      First    Second     Third       All
sex                                           
female  0.968085  0.921053  0.500000  0.742038
male    0.368852  0.157407  0.135447  0.188908
All     0.629630  0.472826  0.242363  0.383838

Edit: I submitted this as an issue to the pandas project: https://github.com/pydata/pandas/issues/10989

Upvotes: 5

Related Questions