Reputation: 674
I have the following code...which for some reason results in the 'key' column disappearing. I have also noticed other times when the key column seems to 'randomly' disappear. I am trying to isolate the cases, this is one.
I am usinng pandas version 0.20.1
DF = pd.DataFrame([['a', 1], ['b', 2], ['b', 3]], columns = ['G', 'N'])
groupByObj = DF.groupby('G')
print groupByObj.get_group('b')
groupByObj.sum()
print groupByObj.get_group('b')
The first print groupByObj.get_group('b')
results in:
G N
1 b 2
2 b 3
The second print groupByObj.get_group('b')
results in:
N
1 2
2 3
Why does the 'key' column ('G') disappear after running groupByObj.sum()
Upvotes: 5
Views: 875
Reputation: 6766
This is a bug in Pandas, discussed in:
The latter is still open.
From reading a bit in GitHub, and as mentioned in the comments, it seems that the second output is the wanted behavior, and was obtained in the sum
case by adding the following line to pandas.core.groupby._GroupBy#_set_group_selection
:
self._reset_cache('_selected_obj')
Since this reset happens when calling sum
(and a few other functions), this G column is still visible on the first get_group
call. BTW - the reset isn't performed also when calling mean
, and a few other functions as well. It seems that this bug is a bit more comprehensive than thought, and was not solved by the simple cache reset.
Upvotes: 1