Reputation: 1073

Remove rows found in more than 3 groups

I have a dataframe, i am trying to remove the rows that are present in >= 3 groups. In my below example bike is the common value across 3 group and i need to remove that. Please help me to achieve this.


df <- data.frame(a = c("name1","name1","name1","name2","name2","name2","name3"), b=c("car","bike","bus","train","bike","tour","bike"))
df
    a    b
 name1  car
 name1  bike
 name1  bus
 name2  train
 name2  bike
 name2  tour
 name3  bike

Expected Output:


 a      b
 name1  car
 name1  bus
 name2  train
 name2  tour

Upvotes: 0

Answers (4)

Raja

Reputation: 167

Using Base R:

df <- data.frame(a = c("name1","name1","name1","name2","name2","name2","name3"), b=c("car","bike","bus","train","bike","tour","bike"))
df

lst <- table(df$b)
df[df$b != names(lst)[lst >=3],]

# a     b
# 1 name1   car
# 3 name1   bus
# 4 name2 train
# 6 name2  tour

Upvotes: 1

sm925

Reputation: 2678

Using data.table:

library(data.table)
setDT(df)[, count := .N, by = b] ## convert df to data.table & create a column to count groups
df <- df[!(count >= 3), ] ## delete rows that have count equal to 3 or more than 3
df[, count := NULL] ## delete the column created 
df

      a     b
1: name1   car
2: name1   bus
3: name2 train
4: name2  tour

Upvotes: 2

Andrew Gustar

Reputation: 18425

In base R you could do this...

df[ave(as.numeric(as.factor(df$a)), #convert a to numbers (factor levels) (required by ave)
       df$b,                        #group by b
       FUN=length) < 3, ]           #return whether no of a's per b is less than 3

      a     b
1 name1   car
3 name1   bus
4 name2 train
6 name2  tour

Upvotes: 2

slava-kohut

Reputation: 4233

You can use dplyr::n_distinct:

n_gr <- 3
cn <- df %>% group_by(b) %>% summarise(na = n_distinct(a)) %>% 
  filter(na >= n_gr) %>% pull(b)

df <- df %>% filter(!(b %in% cn))

Output

 a     b
1 name1   car
2 name1   bus
3 name2 train
4 name2  tour

Upvotes: 3

Remove rows found in more than 3 groups

Answers (4)

Related Questions