How to apply a pre-condition to GroupBy or how to ignore groups with one record in GroupBy

Question

I have a set of rows that I want to group by the value of an identifier - present in each row - and then do further isolated processing on the groups that will be the result of this.

My dataframe looks like this:

In [50]: df
Out[50]: 
  groupkey    b    c   d   e                date
0       C1   b1   c1  d1  e1 2014-10-26 12:13:14
1       C2  NaN   c2  d2  e2 2014-11-02 12:13:14
2       C1   b3   c3  d3  e3 2014-11-09 12:13:14
3       C1   b4  NaN  d4  e4 2014-11-16 12:13:14
4       C3   b5   c5  d5  e5 2014-11-23 12:13:14
5       C2   b6   c6  d6  e6 2014-11-30 12:13:14

and if I were to group by groupkey I know I should just work on the GroupBy returned by:

>> df.groupby('groupkey')

However, before grouping and for the parallel purpose of reducing the size of my dataset, I want to not consider any rows that would only have one record per group (if grouped in the above described manner).

In my example that would mean that row 4 should be left out.

Now, it seems to me that the easiest way to count the records per group, would of course entail grouping first and then counting the records, like that:

>> df.groupby('groupkey').count()

I suppose I could do this and then drop the groups that only have one record.

I am not sure how to fix this without having to manually go back and drop the groups that have one record only.
I was wondering if there's a way to group by some function that will allow me to take this condition into consideration while grouping?

Thanks for the help

EdChum · Accepted Answer

You want to filter the groupby object using len on the groups:

In [9]:
df.groupby('groupkey').filter(lambda x: len(x) > 1)

Out[9]:
  groupkey    b    c   d   e                date
0       C1   b1   c1  d1  e1 2014-10-26 12:13:14
1       C2  NaN   c2  d2  e2 2014-11-02 12:13:14
2       C1   b3   c3  d3  e3 2014-11-09 12:13:14
3       C1   b4  NaN  d4  e4 2014-11-16 12:13:14
5       C2   b6   c6  d6  e6 2014-11-30 12:13:14

How to apply a pre-condition to GroupBy or how to ignore groups with one record in GroupBy

Answers (2)

Related Questions