kindofhungry
kindofhungry

Reputation: 464

Counting number of factors that occur in a data frame then only deleting that row if the number of factors is less than or equal to x

I have a data frame where the $type variable is factor.

df <- data.frame(type=c('a', 'a', 'b', 'c', 'b', 'b', 'c', 'a', 'd'), state=c('Washington','Washington','Washington','Washington','Washington','Washington','Washington','Washington','Washington'))

Goal: a and b appear 3 or more times, but c and d only appears twice and once. I want to delete the rows where df$type has c and d so new df should look like this:

df <- data.frame(type=c('a', 'a', 'b', 'b', 'b', 'a'), state=c('Washington','Washington','Washington','Washington','Washington','Washington'))

Upvotes: 1

Views: 44

Answers (2)

GordonShumway
GordonShumway

Reputation: 2056

And using dplyr:

library(dplyr)
df %>% 
  group_by(type) %>% 
  filter(n()>2)

Upvotes: 4

Mikael Poul Johannesson
Mikael Poul Johannesson

Reputation: 1349

Here is a base R solution using table() and %in%.

df <- data.frame(
  type = c("a", "a", "b", "c", "b", "b", "c", "a", "d"),
  state = rep("Washington", 9)
)

to_keep <- names(table(df$type))[table(df$type) >= 3]
df <- df[df$type %in% to_keep, ]

df

#>  type      state
#> 1    a Washington
#> 2    a Washington
#> 3    b Washington
#> 5    b Washington
#> 6    b Washington
#> 8    a Washington

Upvotes: 2

Related Questions