Asfangen
Asfangen

Reputation: 23

How to only keep rows where a value in a column appear often enough

a=df.groupby('value').size()
newFrame = pd.DataFrame()

for el in a.keys():
    if a[el] > 300000:
        newFrame = pd.concat([newFrame, df[df.value == el]])

I have written this code which does what I want, but is really slow. I only want to keep the rows where the 'value' entry is the same as in 300000 other rows. If it's contained less often than that, I want to drop it.

Upvotes: 2

Views: 168

Answers (2)

jezrael
jezrael

Reputation: 862901

Use GroupBy.transform for Series with same size like original filled by counts with GroupBy.size and filter by boolean indexing:

df = df[df.groupby('value')['value'].transform('size') > 300000]

If processing output later:

df = df[df.groupby('value')['value'].transform('size') > 300000].copy()

Upvotes: 1

BENY
BENY

Reputation: 323306

Just do value_counts

df=df.drop(df.value.value_counts().loc[lambda x : x<=300000].index)

Upvotes: 1

Related Questions