Reputation: 37928
Let's I want the sum of numbers by letter in the below DataFrame:
In [10]: df
Out[10]:
letter number
0 A 1
1 A 2
2 B 3
3 B 4
4 C 5
5 C 6
[6 rows x 2 columns]
This is really easy to accomplish:
In [11]: df.groupby('letter')[['number']].sum()
Out[11]:
number
letter
A 3
B 7
C 11
[3 rows x 1 columns]
But if I were to misspell my column, I'd get NaN
values:
In [12]: df.groupby('letter')[['numberrrrr']].sum()
Out[12]:
numberrrrr
letter
A NaN
B NaN
C NaN
[3 rows x 1 columns]
This led our team on quite a chase to determine where the bug was. Instead, we'd like an error statement, like:
In [13]: df.groupby('letter')['numberrrrr'].sum()
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-13-8ebcdeee8710> in <module>()
----> 1 df.groupby('letter')['numberrrrr'].sum()
/usr/local/Anaconda/lib/python2.7/site-packages/pandas/core/groupby.pyc in __getitem__(self, key)
2475 else:
2476 if key not in self.obj: # pragma: no cover
-> 2477 raise KeyError(str(key))
2478 # kind of a kludge
2479 return SeriesGroupBy(self.obj[key], selection=key,
KeyError: 'numberrrrr'
Is there any particular reason that returning a DataFrame from an aggregation doesn't result in an error when the requested column in missing?
This is on pandas 0.13.1.
Upvotes: 1
Views: 484
Reputation: 129018
This is fixed in master/0.14.0 (releasing end of the week); rc1 is here if you'd like to try
In [7]: df.groupby('letter')[['number']].sum()
Out[7]:
number
letter
A 3
B 7
C 11
In [8]: df.groupby('letter')[['numberrrr']].sum()
KeyError: "Columns not found: 'numberrrr'"
In [9]: pd.__version__
Out[9]: '0.14.0rc1-43-g0dec048'
Upvotes: 3