Israel Motta
Israel Motta

Reputation: 35

Remove columns with factors that has less than 5 observations per level

I have a dataset composed of more than 100 columns and all columns are of type factor. Ex:

          animal               fruit               vehicle              color 
             cat              orange                   car               blue 
             dog               apple                   bus              green 
             dog               apple                   car              green 
             dog              orange                   bus              green

In my dataset i need to remove all columns with factors thas has less than 5 observations per level. In this example, if i want to remove all columns with amount of observations per levels less than or equal to 1, like blue or cat, the algorithm will remove the columns animal and color. What is the most elegant way to do this?

Upvotes: 1

Views: 512

Answers (2)

Ronak Shah
Ronak Shah

Reputation: 389047

We can use select_if from dplyr

library(dplyr)
df1 %>% select_if(~all(table(.) > 1))

#   fruit vehicle
#1 orange     car
#2  apple     bus
#3  apple     car
#4 orange     bus

Upvotes: 0

akrun
akrun

Reputation: 887213

We can use Filter with table

Filter(function(x) !any(table(x) < 2), df1)
#  fruit vehicle
#1 orange     car
#2  apple     bus
#3  apple     car
#4 orange     bus

data

df1 <- structure(list(animal = structure(c(1L, 2L, 2L, 2L), .Label = c("cat", 
"dog"), class = "factor"), fruit = structure(c(2L, 1L, 1L, 2L
), .Label = c("apple", "orange"), class = "factor"), vehicle = structure(c(2L, 
1L, 2L, 1L), .Label = c("bus", "car"), class = "factor"), color = structure(c(1L, 
2L, 2L, 2L), .Label = c("blue", "green"), class = "factor")),
row.names = c(NA, 
-4L), class = "data.frame")

Upvotes: 1

Related Questions