Reputation: 8567
My code (from the book Python Data Science Handbook (O'Reilly)):
Full disclosure: at the time of writing, the book is still in early release, meaning that it's still unedited and in its raw form.
import numpy as np
import pandas as pd
import seaborn as sns
titanic = sns.load_dataset('titanic')
titanic.pivot_table('survived', index='sex', columns='class')
The result is:
However, if I now try to add totals using the margins
keyword, the following error occurs:
titanic.pivot_table('survived', index='sex', columns='class', margins=True)
TypeError: cannot insert an item into a CategoricalIndex that is not already an existing category
Any idea what could be causing this?
Version info:
Upvotes: 2
Views: 2505
Reputation: 1
pivot_tables creates a new DataFrame therefore you need a new variable new_var = titanic.pivot_table(...)
(..., margins = True) is the mean (average) of each column and row (bool or int)
just del (..., margins = True) and use .aggfunc = sum)
Upvotes: 0
Reputation: 86433
This appears to be due to a change between pandas 0.15 and 0.16. In previous versions, the titanic dataset has this dtype:
In [1]: import pandas, seaborn
In [2]: pandas.__version__
'0.15.2'
In [3]: titanic = seaborn.load_dataset('titanic')
In [4]: titanic.dtypes
Out[4]:
survived int64
pclass int64
sex object
age float64
sibsp int64
parch int64
fare float64
embarked object
class object
who object
adult_male bool
deck object
embark_town object
alive object
alone bool
dtype: object
With the newer pandas:
In [1]: import pandas, seaborn
In [2]: pandas.__version__
'0.16.2'
In [3]: titanic = seaborn.load_dataset('titanic')
In [4]: titanic.dtypes
Out[4]:
survived int64
pclass int64
sex object
age float64
sibsp int64
parch int64
fare float64
embarked object
class category
who object
adult_male bool
deck category
embark_town object
alive object
alone bool
dtype: object
Several columns are automatically converted to categorical, which brings up this bug. The book is currently unpublished and unedited; I'll be sure to test with recent releases and fix these types of errors before publication.
For now, here is a workaround:
In [5]: titanic['class'] = titanic['class'].astype(object)
In [6]: titanic.pivot_table('survived', index='sex', columns='class', margins=True)
Out[6]:
class First Second Third All
sex
female 0.968085 0.921053 0.500000 0.742038
male 0.368852 0.157407 0.135447 0.188908
All 0.629630 0.472826 0.242363 0.383838
Edit: I submitted this as an issue to the pandas project: https://github.com/pydata/pandas/issues/10989
Upvotes: 5