Daniil Yefimov
Daniil Yefimov

Reputation: 1012

How to remove one specific factor level in all factor variables in r?

For example, I have data frame with 6 columns (all are factors).

I want to delete a specific level, for example "no", from all factors, in which this level appears.

I want to drop the factor level "no" from my factor variable and at the same time I want to delete (set to NA's) all answers, that have "no" value.

I have tried this code:

sapply(fact,function(x) levels(x)[levels(x) == "no"] <- NULL)

But this code doesn't work.

How can I do this?

Upvotes: 2

Views: 6893

Answers (3)

Thomas Rosa
Thomas Rosa

Reputation: 732

Great answers above. I'll add that if not all of your columns are factors and/or you want to preserve all factor levels (including ones with no data) other than the one explicitly being removed you'll need a more general approach:

#Define a helper function
removeOneLevel <- function(v, badlevel){
  v[v==badlevel] = NA
  v2 = droplevels(v)
  levels(v2) = levels(v)[levels(v) != badlevel]
  return(v2)}

# Use dplyr to perform that function on all factor columns
library(dplyr)
dfNew = mutate_if(df, is.factor, removeOneLevel, badlevel = 'no')

Upvotes: 1

989
989

Reputation: 12937

How about this:

> df
   # c1  c2  c3
# 1 yes yes  no
# 2  no  ok yes
# 3  ok  no  ok
# 4 yes yes  no
# 5  no  ok yes
# 6  ok  no  ok
# 7 yes yes  no
# 8  no  ok yes
# 9  ok  no  ok

toRemove <- "no"
data.frame(lapply(df, 
          function(x) factor(as.character(x), levels=levels(x)[levels(x)!=toRemove])))

    # c1   c2   c3
# 1  yes  yes <NA>
# 2 <NA>   ok  yes
# 3   ok <NA>   ok
# 4  yes  yes <NA>
# 5 <NA>   ok  yes
# 6   ok <NA>   ok
# 7  yes  yes <NA>
# 8 <NA>   ok  yes
# 9   ok <NA>   ok

toy data

df <- structure(list(c1 = structure(c(3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 
2L), .Label = c("no", "ok", "yes"), class = "factor"), c2 = structure(c(3L, 
2L, 1L, 3L, 2L, 1L, 3L, 2L, 1L), .Label = c("no", "ok", "yes"
), class = "factor"), c3 = structure(c(1L, 3L, 2L, 1L, 3L, 2L, 
1L, 3L, 2L), .Label = c("no", "ok", "yes"), class = "factor")), .Names = c("c1", 
"c2", "c3"), row.names = c(NA, -9L), class = "data.frame")

Upvotes: 2

lmo
lmo

Reputation: 38500

I think this should accomplish what you are trying to do.

dfNew <- data.frame(lapply(df, function(x) {is.na(x[x=="no"]) <- TRUE; droplevels(x)}))

data

set.seed(1234)
df <- data.frame(q1=sample(c("yes", "no", "maybe"), 20, replace=TRUE),
                 q2=sample(c("yes", "no", "maybe"), 20, replace=TRUE),
                 q3=sample(c("yes", "no", "maybe"), 20, replace=TRUE))

Upvotes: 3

Related Questions