Reputation: 4601
I am trying to replicate one example out of Wes McKinney's book on Pandas, the code is here (it assumes all names datafiles are under names folder)
# -*- coding: utf-8 -*-
import numpy as np
import pandas as pd
years = range(1880, 2011)
pieces = []
columns = ['name', 'sex', 'births']
for year in years:
path = 'names/yob%d.txt' % year
frame = pd.read_csv(path, names=columns)
frame['year'] = year
pieces.append(frame)
names = pd.concat(pieces, ignore_index=True)
names
def get_tops(group):
return group.sort_index(by='births', ascending=False)[:1000]
grouped = names.groupby(['year','sex'])
grouped.apply(get_tops)
I am using Pandas 0.10 and Python 2.7. The error I am seeing is this:
Traceback (most recent call last):
File "names.py", line 21, in <module>
grouped.apply(get_tops)
File "/usr/local/lib/python2.7/dist-packages/pandas-0.10.0-py2.7-linux-i686.egg/pandas/core/groupby.py", line 321, in apply
return self._python_apply_general(f)
File "/usr/local/lib/python2.7/dist-packages/pandas-0.10.0-py2.7-linux-i686.egg/pandas/core/groupby.py", line 324, in _python_apply_general
keys, values, mutated = self.grouper.apply(f, self.obj, self.axis)
File "/usr/local/lib/python2.7/dist-packages/pandas-0.10.0-py2.7-linux-i686.egg/pandas/core/groupby.py", line 585, in apply
values, mutated = splitter.fast_apply(f, group_keys)
File "/usr/local/lib/python2.7/dist-packages/pandas-0.10.0-py2.7-linux-i686.egg/pandas/core/groupby.py", line 2127, in fast_apply
results, mutated = lib.apply_frame_axis0(sdata, f, names, starts, ends)
File "reduce.pyx", line 421, in pandas.lib.apply_frame_axis0 (pandas/lib.c:24934)
File "/usr/local/lib/python2.7/dist-packages/pandas-0.10.0-py2.7-linux-i686.egg/pandas/core/frame.py", line 2028, in __setattr__
self[name] = value
File "/usr/local/lib/python2.7/dist-packages/pandas-0.10.0-py2.7-linux-i686.egg/pandas/core/frame.py", line 2043, in __setitem__
self._set_item(key, value)
File "/usr/local/lib/python2.7/dist-packages/pandas-0.10.0-py2.7-linux-i686.egg/pandas/core/frame.py", line 2078, in _set_item
value = self._sanitize_column(key, value)
File "/usr/local/lib/python2.7/dist-packages/pandas-0.10.0-py2.7-linux-i686.egg/pandas/core/frame.py", line 2112, in _sanitize_column
raise AssertionError('Length of values does not match '
AssertionError: Length of values does not match length of index
Any ideas?
Upvotes: 0
Views: 457
Reputation: 353059
I think this was a bug introduced in 0.10, namely issue #2605, "AssertionError when using apply after GroupBy". It's since been fixed.
You can either wait for the 0.10.1 release, which shouldn't be too long from now, or you can upgrade to the development version (either via git
or simply by downloading the zip
of master.)
Upvotes: 2