Theodor
Theodor

Reputation: 1026

Re-label many factor rows with dplyr

I would like to replicate the following behaviour with dplyr if that is possible.

It is pretty trivial what I am doing: I have a number of factors that have a certain baseline level, and I would like to simplify them to 0/1 variables.

If I simulate a data set like this:

df <- data.frame(id = 1:100,
                 x = factor(sample(c("a", "b", "c"), 100, T)),
                 y = factor(sample(c("a", "b", "c"), 100, T)))

Then I can easily achieve that like this:

fn <- function(x) {
  ifelse(x == "c", 0, 1)
}

df[c("x", "y")] <- apply(df[c("x", "y")], 2, fn) 
df

However in dplyr this seems to defeat me. I am thinking about using mutate_which, but i can't seem to get that to work with a custom function like fn.

Upvotes: 0

Views: 735

Answers (1)

Theodor
Theodor

Reputation: 1026

The answer posted by Psidon,

df %>% mutate(x = fn(x), y = fn(y))

is correct but not easily generalizable.

The answer proposed by Steven Beaupré is more generalizable:

df %>% mutate_at(vars(x:y), funs(if_else(. == "c", 0, 1)))

Or a more transparent version,

df %>% mutate_at(.funs = function(x) {ifelse(x == "c", 0, 1)}, .cols = vars(x:y))

My main problem was that this was not working with mutate_each, which seems to be phased out:

df %>% mutate_each(funs = function(x) {ifelse(x == "c", 0, 1)}, cols = vars(x, y))
Error: is.fun_list(calls) is not TRUE

Upvotes: 1

Related Questions