How to select row values in pandas from one column based on these row values satisfying some condition in another column everywhere they appear

Question

The title is confusing.

So, say I have a dataframe with one column, id, which occurs multiple times throughout my dataframe. Then I have another column, lets call it cumulativeOccurrences.

How do I select all unique occurrences of id such that the other column fulfills a certain condition, say cumulativeOccurrences > 20 for each and every instance of that id?

The beginning of the code is probably something like this:

dataframe.groupby('id')

But I can't figure out the rest.

Here is a sample small dataset that should return zero values:

id            cumulativeOccurrences
5494178       136
5494178        71
5494178        18
5494178        83
5494178        57
5494178       181
5494178        13
5494178        10
5494178        90
5494178      4484

Okay, here is the result I got after more muddling around:

res = df[['id','cumulativeOccurrences']].groupby(['id']).agg({'cumulativeOccurrences':[lambda x: all([e > 20 for e in x])]})
ids = res[res.cumulativeOccurrences['']==True].index

This gives me a list of ids which fulfill the condition. There probably is a better way than the list comprehension lambda function for the agg function, though. Any ideas?

How to select row values in pandas from one column based on these row values satisfying some condition in another column everywhere they appear

Answers (1)

Related Questions