BBSysDyn
BBSysDyn

Reputation: 4601

Pandas Group Example Errors

I am trying to replicate one example out of Wes McKinney's book on Pandas, the code is here (it assumes all names datafiles are under names folder)

# -*- coding: utf-8 -*-
import numpy as np
import pandas as pd

years = range(1880, 2011)
pieces = []
columns = ['name', 'sex', 'births']
for year in years: 
    path = 'names/yob%d.txt' % year
    frame = pd.read_csv(path, names=columns)
    frame['year'] = year
    pieces.append(frame)

names = pd.concat(pieces, ignore_index=True)
names

def get_tops(group):    
    return group.sort_index(by='births', ascending=False)[:1000]

grouped = names.groupby(['year','sex'])
grouped.apply(get_tops)

I am using Pandas 0.10 and Python 2.7. The error I am seeing is this:

Traceback (most recent call last):
  File "names.py", line 21, in <module>
    grouped.apply(get_tops)
  File "/usr/local/lib/python2.7/dist-packages/pandas-0.10.0-py2.7-linux-i686.egg/pandas/core/groupby.py", line 321, in apply
    return self._python_apply_general(f)
  File "/usr/local/lib/python2.7/dist-packages/pandas-0.10.0-py2.7-linux-i686.egg/pandas/core/groupby.py", line 324, in _python_apply_general
    keys, values, mutated = self.grouper.apply(f, self.obj, self.axis)
  File "/usr/local/lib/python2.7/dist-packages/pandas-0.10.0-py2.7-linux-i686.egg/pandas/core/groupby.py", line 585, in apply
    values, mutated = splitter.fast_apply(f, group_keys)
  File "/usr/local/lib/python2.7/dist-packages/pandas-0.10.0-py2.7-linux-i686.egg/pandas/core/groupby.py", line 2127, in fast_apply
    results, mutated = lib.apply_frame_axis0(sdata, f, names, starts, ends)
  File "reduce.pyx", line 421, in pandas.lib.apply_frame_axis0 (pandas/lib.c:24934)
  File "/usr/local/lib/python2.7/dist-packages/pandas-0.10.0-py2.7-linux-i686.egg/pandas/core/frame.py", line 2028, in __setattr__
    self[name] = value
  File "/usr/local/lib/python2.7/dist-packages/pandas-0.10.0-py2.7-linux-i686.egg/pandas/core/frame.py", line 2043, in __setitem__
    self._set_item(key, value)
  File "/usr/local/lib/python2.7/dist-packages/pandas-0.10.0-py2.7-linux-i686.egg/pandas/core/frame.py", line 2078, in _set_item
    value = self._sanitize_column(key, value)
  File "/usr/local/lib/python2.7/dist-packages/pandas-0.10.0-py2.7-linux-i686.egg/pandas/core/frame.py", line 2112, in _sanitize_column
    raise AssertionError('Length of values does not match '
AssertionError: Length of values does not match length of index

Any ideas?

Upvotes: 0

Views: 457

Answers (1)

DSM
DSM

Reputation: 353059

I think this was a bug introduced in 0.10, namely issue #2605, "AssertionError when using apply after GroupBy". It's since been fixed.

You can either wait for the 0.10.1 release, which shouldn't be too long from now, or you can upgrade to the development version (either via git or simply by downloading the zip of master.)

Upvotes: 2

Related Questions