Reputation: 5
I have a pandas DataFrame with shape(1000,8)
so i want to make new DataFrame but with condition in one column but not an easy condition it is the value counts of on the specific row, lets say for example we have a column where df.column1 = [1,2,2,2,3,3,4,5,8,8,8,8]
i to have the same DataFrame with the sames columns but with condition on column1, i want only rows where the values of column1 are repeated more than 3 times, so i get : df.column1 = [8,8,8,8]
Upvotes: 0
Views: 516
Reputation: 4284
You can use value_counts
and keep only the most frequent values
import pandas as pd
# define df
df = pd.DataFrame()
df['column1'] = [1,2,2,2,3,3,4,5,8,8,8,8]
#get counts
counts = df['column1'].value_counts()
# keep only counts>3
counts = counts[counts>3]
# get the index to see which column1 values should be kept
to_keep = counts.index
# filter df with only correct values of column1
df.loc[df['column1'].isin(to_keep)]
# column1
#8 8
#9 8
#10 8
#11 8
Upvotes: 1
Reputation: 30930
Use GroupBy.filter
:
Here is an example
import pandas as pd
# define df
df = pd.DataFrame()
df['column1'] = [1,2,2,2,3,3,4,5,8,8,8,8]
df['column2']=range(0,len(df['column1']))
Method 1
new_df=df.groupby('column1').filter(lambda x: x.column1.size>3)
print(new_df)
column1 column2
8 8 8
9 8 9
10 8 10
11 8 11
Method 2
or Groupby.transform
to performance a boolean indexing
:
new_df=df[df.groupby('column1').column1.transform('size')>3]
print(new_df)
8 8 8
9 8 9
10 8 10
11 8 11
Method 3
Finally if you want use value_counts
is better if you use Series.map
:
new_df=df[df.column1.map(df.column1.value_counts())>3]
print(new_df)
column1 column2
8 8 8
9 8 9
10 8 10
11 8 11
Upvotes: 0