Reputation: 23
a=df.groupby('value').size()
newFrame = pd.DataFrame()
for el in a.keys():
if a[el] > 300000:
newFrame = pd.concat([newFrame, df[df.value == el]])
I have written this code which does what I want, but is really slow. I only want to keep the rows where the 'value' entry is the same as in 300000 other rows. If it's contained less often than that, I want to drop it.
Upvotes: 2
Views: 168
Reputation: 862901
Use GroupBy.transform
for Series with same size like original filled by counts with GroupBy.size
and filter by boolean indexing
:
df = df[df.groupby('value')['value'].transform('size') > 300000]
If processing output later:
df = df[df.groupby('value')['value'].transform('size') > 300000].copy()
Upvotes: 1
Reputation: 323306
Just do value_counts
df=df.drop(df.value.value_counts().loc[lambda x : x<=300000].index)
Upvotes: 1