Reputation: 1101
I want to create a dummy variable if all entries (in cols value_1_value_3) are equal to a given character (e.g. "C"), or are NAs.
Toy example:
df <- data.frame(state=rep("state"),
candidate=c("a","b","c"),
value_1= c("A","B","C"),
value_2= c("A","B",NA),
value_3= c("C",NA,NA), stringsAsFactors = FALSE)
Desiderata:
df <- data.frame(state=rep("state"),
candidate=c("a","b","c"),
value_1= c("A","B","C"),
value_2= c("A","B",NA),
value_3= c("C",NA,NA),
dummy=c(0,0,1),stringsAsFactors = FALSE)
I tried (but does not work):
df$dummy <- ifelse(df[-(1:2)] %in% c("C","NA"),1,0)
Upvotes: 1
Views: 301
Reputation: 886938
An option using tidyverse
library(tidyverse)
df %>%
mutate(dummy = pmap_int(select(., value_1, value_3),
~ +(!sum(c(...) != "C", na.rm = TRUE))))
# state candidate value_1 value_2 value_3 dummy
#1 state a A A C 0
#2 state b B B <NA> 0
#3 state c C <NA> <NA> 1
Upvotes: 0
Reputation: 66819
Another way:
rowSums(df[-(1:2)] != "C", na.rm=TRUE) == 0
# [1] FALSE FALSE TRUE
How it works:
Confusingly, df[-(1:2)] == "C"
yields a matrix, while df[-(1:2)] %in% "C"
does not. To handle the latter, wrap as.matrix(df[-(1:2)])
first.
Upvotes: 3
Reputation: 388807
We can use apply
row-wise and check if all
the entries in the selected columns is equal to "C"
, ignoring the NA
values.
cols <- grep("^value", names(df))
df$dummy <- as.integer(apply(df[cols] == "C", 1, all, na.rm = TRUE))
df
# state candidate value_1 value_2 value_3 dummy
#1 state a A A C 0
#2 state b B B <NA> 0
#3 state c C <NA> <NA> 1
As far as your attempt is concerned, %in%
will not work on entire dataframe, you need to use sapply
/lapply
to check for values in multiple columns. In fact you can avoid ifelse
here
df$dummy <- as.integer(sapply(df[-c(1:2)], function(x) all(x %in% c(NA, "C"))))
Upvotes: 3