Remove columns with factors that has less than 5 observations per level

Question

I have a dataset composed of more than 100 columns and all columns are of type factor. Ex:

          animal               fruit               vehicle              color 
             cat              orange                   car               blue 
             dog               apple                   bus              green 
             dog               apple                   car              green 
             dog              orange                   bus              green

In my dataset i need to remove all columns with factors thas has less than 5 observations per level. In this example, if i want to remove all columns with amount of observations per levels less than or equal to 1, like blue or cat, the algorithm will remove the columns animal and color. What is the most elegant way to do this?

akrun · Accepted Answer

We can use Filter with table

Filter(function(x) !any(table(x) < 2), df1)
#  fruit vehicle
#1 orange     car
#2  apple     bus
#3  apple     car
#4 orange     bus

data

df1 <- structure(list(animal = structure(c(1L, 2L, 2L, 2L), .Label = c("cat", 
"dog"), class = "factor"), fruit = structure(c(2L, 1L, 1L, 2L
), .Label = c("apple", "orange"), class = "factor"), vehicle = structure(c(2L, 
1L, 2L, 1L), .Label = c("bus", "car"), class = "factor"), color = structure(c(1L, 
2L, 2L, 2L), .Label = c("blue", "green"), class = "factor")),
row.names = c(NA, 
-4L), class = "data.frame")

Remove columns with factors that has less than 5 observations per level

Answers (2)

data

Related Questions