Reputation: 1073
I have a dataframe, i am trying to remove the rows that are present in >= 3 groups. In my below example bike is the common value across 3 group and i need to remove that. Please help me to achieve this.
df <- data.frame(a = c("name1","name1","name1","name2","name2","name2","name3"), b=c("car","bike","bus","train","bike","tour","bike"))
df
a b
name1 car
name1 bike
name1 bus
name2 train
name2 bike
name2 tour
name3 bike
Expected Output:
a b
name1 car
name1 bus
name2 train
name2 tour
Upvotes: 0
Views: 51
Reputation: 167
Using Base R:
df <- data.frame(a = c("name1","name1","name1","name2","name2","name2","name3"), b=c("car","bike","bus","train","bike","tour","bike"))
df
lst <- table(df$b)
df[df$b != names(lst)[lst >=3],]
# a b
# 1 name1 car
# 3 name1 bus
# 4 name2 train
# 6 name2 tour
Upvotes: 1
Reputation: 2678
Using data.table
:
library(data.table)
setDT(df)[, count := .N, by = b] ## convert df to data.table & create a column to count groups
df <- df[!(count >= 3), ] ## delete rows that have count equal to 3 or more than 3
df[, count := NULL] ## delete the column created
df
a b
1: name1 car
2: name1 bus
3: name2 train
4: name2 tour
Upvotes: 2
Reputation: 18425
In base R you could do this...
df[ave(as.numeric(as.factor(df$a)), #convert a to numbers (factor levels) (required by ave)
df$b, #group by b
FUN=length) < 3, ] #return whether no of a's per b is less than 3
a b
1 name1 car
3 name1 bus
4 name2 train
6 name2 tour
Upvotes: 2
Reputation: 4233
You can use dplyr::n_distinct
:
n_gr <- 3
cn <- df %>% group_by(b) %>% summarise(na = n_distinct(a)) %>%
filter(na >= n_gr) %>% pull(b)
df <- df %>% filter(!(b %in% cn))
Output
a b
1 name1 car
2 name1 bus
3 name2 train
4 name2 tour
Upvotes: 3