BrainPermafrost
BrainPermafrost

Reputation: 674

In Pandas, why does the groupby 'key' column disappear in this scenario

I have the following code...which for some reason results in the 'key' column disappearing. I have also noticed other times when the key column seems to 'randomly' disappear. I am trying to isolate the cases, this is one.

I am usinng pandas version 0.20.1

DF = pd.DataFrame([['a', 1], ['b', 2], ['b', 3]], columns = ['G', 'N'])
groupByObj = DF.groupby('G')
print groupByObj.get_group('b')
groupByObj.sum()
print groupByObj.get_group('b')

The first print groupByObj.get_group('b') results in:

   G  N
1  b  2
2  b  3

The second print groupByObj.get_group('b') results in:

   N
1  2
2  3

Why does the 'key' column ('G') disappear after running groupByObj.sum()

Upvotes: 5

Views: 875

Answers (1)

Shovalt
Shovalt

Reputation: 6766

This is a bug in Pandas, discussed in:

The latter is still open.

From reading a bit in GitHub, and as mentioned in the comments, it seems that the second output is the wanted behavior, and was obtained in the sum case by adding the following line to pandas.core.groupby._GroupBy#_set_group_selection:

self._reset_cache('_selected_obj')

Since this reset happens when calling sum (and a few other functions), this G column is still visible on the first get_group call. BTW - the reset isn't performed also when calling mean, and a few other functions as well. It seems that this bug is a bit more comprehensive than thought, and was not solved by the simple cache reset.

Upvotes: 1

Related Questions