groupby of a groupby to select values in pandas

Question

I have a data frame as follows:

marker    date         value       identifier

EA    2007-01-01      0.33            55
EA    2007-01-01      0.73            56
EA    2007-01-01      0.51            57
EA    2007-02-01      0.13            55
EA    2007-02-01      0.23            57
EA    2007-03-01      0.82            55
EA    2007-03-01      0.88            56
EB    2007-01-01      0.13            45
EB    2007-01-01      0.74            46
EB    2007-01-01      0.56            47
EB    2007-02-01      0.93            45
EB    2007-02-01      0.23            47
EB    2007-03-01      0.82            45
EB    2007-03-01      0.38            46
EB    2007-03-01      0.19            47

Now I want to do a selection on this data frame by value, so I use

df.groupby(marker).get_group('EA')

But I also want to get the mean of the value, and notice that I have a duplicated date index, so now I have to do two groupbys because the index is different, leading to

df.groupby(marker).get_group('EA').groupby(df.groupby(marker).get_group('EA').index.date).mean()['value'].plot()

what clearly is not really legible. How can I accomplish this without creating a intermediary variable?

Ami Tavory · Accepted Answer

You can't, for the reason you wrote above in your comment about the AssertionError. Pandas expects to do the (second) groupby according to some sequence which has exactly the same length as the DataFrame getting grouped. If you're unwilling to first create a DataFrame describing the EA values, you're basically stuck with creating it again on the fly.

Not only is that less legible, it is unnecessarily expensive. Speaking of which, I'd rewrite your code like this:

eas = df[df.marker == 'EA']
eas.value.groupby(eas.date).mean().plot();

Doing a groupby and retaining a single group is a very expensive way of just filtering according to the key.

groupby of a groupby to select values in pandas

Answers (1)

Related Questions