Reputation: 20101
I have a data frame as follows:
marker date value identifier
EA 2007-01-01 0.33 55
EA 2007-01-01 0.73 56
EA 2007-01-01 0.51 57
EA 2007-02-01 0.13 55
EA 2007-02-01 0.23 57
EA 2007-03-01 0.82 55
EA 2007-03-01 0.88 56
EB 2007-01-01 0.13 45
EB 2007-01-01 0.74 46
EB 2007-01-01 0.56 47
EB 2007-02-01 0.93 45
EB 2007-02-01 0.23 47
EB 2007-03-01 0.82 45
EB 2007-03-01 0.38 46
EB 2007-03-01 0.19 47
Now I want to do a selection on this data frame by value, so I use
df.groupby(marker).get_group('EA')
But I also want to get the mean of the value, and notice that I have a duplicated date index, so now I have to do two groupbys because the index is different, leading to
df.groupby(marker).get_group('EA').groupby(df.groupby(marker).get_group('EA').index.date).mean()['value'].plot()
what clearly is not really legible. How can I accomplish this without creating a intermediary variable?
Upvotes: 1
Views: 1376
Reputation: 76297
You can't, for the reason you wrote above in your comment about the AssertionError
. Pandas expects to do the (second) groupby
according to some sequence which has exactly the same length as the DataFrame
getting grouped. If you're unwilling to first create a DataFrame
describing the EA
values, you're basically stuck with creating it again on the fly.
Not only is that less legible, it is unnecessarily expensive. Speaking of which, I'd rewrite your code like this:
eas = df[df.marker == 'EA']
eas.value.groupby(eas.date).mean().plot();
Doing a groupby
and retaining a single group is a very expensive way of just filtering according to the key.
Upvotes: 1