Reputation: 27
Hello guys I have a problem with this function that I want to implement inside my code. Assuming that I am working on this data frame.
df = pd.DataFrame([[100, 1],[100, 1],[200, 2],[200, 2],[200, 2]], columns=['a','b'])
Now I would like to count first the unique entries of column "a" and then filter select only those element in column "a" that are bigger than 3
group=df.groupby('a').count()
filter=group['b'].isin([3])
The output desired is a list that contain ONLY those element of the series "a" that satisfy the filter condition (named "filter"), so that from this new feature it is possible to filter back the initial filter so that i will keep only the rows 2,3,4 (counting from zero).
I hope my intent is clear, but of course in case I am approching the problem from the wrong point of view any suggestion is welcome.
Upvotes: 0
Views: 568
Reputation: 2500
In [1]: import pandas as pd
In [2]: df = pd.DataFrame([[100, 1],[100, 1],[200, 2],[200, 2],[200, 2]], columns=['a','b'])
In [3]: pd.concat([i[1] for i in df.groupby('a') if len(i[1]) >2 ])
Out[3]:
a b
2 200 2
3 200 2
4 200 2
Upvotes: 0
Reputation: 153500
IIUC, I don't think you have enough test data to test "bigger than 3",however you can test "bigger than 2".
df[df.groupby('a')['a'].transform('count').gt(2)]
Output:
a b
2 200 2
3 200 2
4 200 2
Upvotes: 1