Reputation: 1012
For example, I have data frame with 6 columns (all are factors).
I want to delete a specific level, for example "no", from all factors, in which this level appears.
I want to drop the factor level "no" from my factor variable and at the same time I want to delete (set to NA's) all answers, that have "no" value.
I have tried this code:
sapply(fact,function(x) levels(x)[levels(x) == "no"] <- NULL)
But this code doesn't work.
How can I do this?
Upvotes: 2
Views: 6893
Reputation: 732
Great answers above. I'll add that if not all of your columns are factors and/or you want to preserve all factor levels (including ones with no data) other than the one explicitly being removed you'll need a more general approach:
#Define a helper function
removeOneLevel <- function(v, badlevel){
v[v==badlevel] = NA
v2 = droplevels(v)
levels(v2) = levels(v)[levels(v) != badlevel]
return(v2)}
# Use dplyr to perform that function on all factor columns
library(dplyr)
dfNew = mutate_if(df, is.factor, removeOneLevel, badlevel = 'no')
Upvotes: 1
Reputation: 12937
How about this:
> df
# c1 c2 c3
# 1 yes yes no
# 2 no ok yes
# 3 ok no ok
# 4 yes yes no
# 5 no ok yes
# 6 ok no ok
# 7 yes yes no
# 8 no ok yes
# 9 ok no ok
toRemove <- "no"
data.frame(lapply(df,
function(x) factor(as.character(x), levels=levels(x)[levels(x)!=toRemove])))
# c1 c2 c3
# 1 yes yes <NA>
# 2 <NA> ok yes
# 3 ok <NA> ok
# 4 yes yes <NA>
# 5 <NA> ok yes
# 6 ok <NA> ok
# 7 yes yes <NA>
# 8 <NA> ok yes
# 9 ok <NA> ok
toy data
df <- structure(list(c1 = structure(c(3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L,
2L), .Label = c("no", "ok", "yes"), class = "factor"), c2 = structure(c(3L,
2L, 1L, 3L, 2L, 1L, 3L, 2L, 1L), .Label = c("no", "ok", "yes"
), class = "factor"), c3 = structure(c(1L, 3L, 2L, 1L, 3L, 2L,
1L, 3L, 2L), .Label = c("no", "ok", "yes"), class = "factor")), .Names = c("c1",
"c2", "c3"), row.names = c(NA, -9L), class = "data.frame")
Upvotes: 2
Reputation: 38500
I think this should accomplish what you are trying to do.
dfNew <- data.frame(lapply(df, function(x) {is.na(x[x=="no"]) <- TRUE; droplevels(x)}))
data
set.seed(1234)
df <- data.frame(q1=sample(c("yes", "no", "maybe"), 20, replace=TRUE),
q2=sample(c("yes", "no", "maybe"), 20, replace=TRUE),
q3=sample(c("yes", "no", "maybe"), 20, replace=TRUE))
Upvotes: 3