Axibord
Axibord

Reputation: 5

How take the same DataFrame with a condition on value counts of a column in pandas?

I have a pandas DataFrame with shape(1000,8) so i want to make new DataFrame but with condition in one column but not an easy condition it is the value counts of on the specific row, lets say for example we have a column where df.column1 = [1,2,2,2,3,3,4,5,8,8,8,8] i to have the same DataFrame with the sames columns but with condition on column1, i want only rows where the values of column1 are repeated more than 3 times, so i get : df.column1 = [8,8,8,8]

Upvotes: 0

Views: 516

Answers (2)

fmarm
fmarm

Reputation: 4284

You can use value_counts and keep only the most frequent values

import pandas as pd
# define df
df = pd.DataFrame()
df['column1'] = [1,2,2,2,3,3,4,5,8,8,8,8] 

#get counts
counts = df['column1'].value_counts()

# keep only counts>3
counts = counts[counts>3]

# get the index to see which column1 values should be kept
to_keep = counts.index

# filter df with only correct values of column1
df.loc[df['column1'].isin(to_keep)]

#   column1
#8  8
#9  8
#10 8
#11 8

Upvotes: 1

ansev
ansev

Reputation: 30930

Use GroupBy.filter:

Here is an example

import pandas as pd
# define df
df = pd.DataFrame()
df['column1'] = [1,2,2,2,3,3,4,5,8,8,8,8]
df['column2']=range(0,len(df['column1']))

Method 1

new_df=df.groupby('column1').filter(lambda x: x.column1.size>3)
print(new_df)
    column1  column2
8         8        8
9         8        9
10        8       10
11        8       11

Method 2

or Groupby.transform to performance a boolean indexing:

new_df=df[df.groupby('column1').column1.transform('size')>3]
print(new_df)

8         8        8
9         8        9
10        8       10
11        8       11

Method 3

Finally if you want use value_counts is better if you use Series.map:

new_df=df[df.column1.map(df.column1.value_counts())>3]
print(new_df)
    column1  column2
8         8        8
9         8        9
10        8       10
11        8       11

Upvotes: 0

Related Questions