How take the same DataFrame with a condition on value counts of a column in pandas?

Question

I have a pandas DataFrame with shape(1000,8) so i want to make new DataFrame but with condition in one column but not an easy condition it is the value counts of on the specific row, lets say for example we have a column where df.column1 = [1,2,2,2,3,3,4,5,8,8,8,8] i to have the same DataFrame with the sames columns but with condition on column1, i want only rows where the values of column1 are repeated more than 3 times, so i get : df.column1 = [8,8,8,8]

ansev · Accepted Answer

Use GroupBy.filter:

Here is an example

import pandas as pd
# define df
df = pd.DataFrame()
df['column1'] = [1,2,2,2,3,3,4,5,8,8,8,8]
df['column2']=range(0,len(df['column1']))

Method 1

new_df=df.groupby('column1').filter(lambda x: x.column1.size>3)
print(new_df)
    column1  column2
8         8        8
9         8        9
10        8       10
11        8       11

Method 2

or Groupby.transform to performance a boolean indexing:

new_df=df[df.groupby('column1').column1.transform('size')>3]
print(new_df)

8         8        8
9         8        9
10        8       10
11        8       11

Method 3

Finally if you want use value_counts is better if you use Series.map:

new_df=df[df.column1.map(df.column1.value_counts())>3]
print(new_df)
    column1  column2
8         8        8
9         8        9
10        8       10
11        8       11

How take the same DataFrame with a condition on value counts of a column in pandas?

Answers (2)

Related Questions