Reputation: 35
I have a dataset composed of more than 100 columns and all columns are of type factor. Ex:
animal fruit vehicle color
cat orange car blue
dog apple bus green
dog apple car green
dog orange bus green
In my dataset i need to remove all columns with factors thas has less than 5 observations per level. In this example, if i want to remove all columns with amount of observations per levels less than or equal to 1
, like blue
or cat
, the algorithm will remove the columns animal
and color
. What is the most elegant way to do this?
Upvotes: 1
Views: 512
Reputation: 389047
We can use select_if
from dplyr
library(dplyr)
df1 %>% select_if(~all(table(.) > 1))
# fruit vehicle
#1 orange car
#2 apple bus
#3 apple car
#4 orange bus
Upvotes: 0
Reputation: 887213
We can use Filter
with table
Filter(function(x) !any(table(x) < 2), df1)
# fruit vehicle
#1 orange car
#2 apple bus
#3 apple car
#4 orange bus
df1 <- structure(list(animal = structure(c(1L, 2L, 2L, 2L), .Label = c("cat",
"dog"), class = "factor"), fruit = structure(c(2L, 1L, 1L, 2L
), .Label = c("apple", "orange"), class = "factor"), vehicle = structure(c(2L,
1L, 2L, 1L), .Label = c("bus", "car"), class = "factor"), color = structure(c(1L,
2L, 2L, 2L), .Label = c("blue", "green"), class = "factor")),
row.names = c(NA,
-4L), class = "data.frame")
Upvotes: 1