MCS
MCS

Reputation: 1101

Create dummy based on character vectors in r

I want to create a dummy variable if all entries (in cols value_1_value_3) are equal to a given character (e.g. "C"), or are NAs.

Toy example:

df <- data.frame(state=rep("state"),
               candidate=c("a","b","c"),
               value_1= c("A","B","C"),
               value_2= c("A","B",NA),
               value_3= c("C",NA,NA), stringsAsFactors = FALSE)

Desiderata:

df <- data.frame(state=rep("state"),
             candidate=c("a","b","c"),
             value_1= c("A","B","C"),
             value_2= c("A","B",NA),
             value_3= c("C",NA,NA), 
             dummy=c(0,0,1),stringsAsFactors = FALSE)

I tried (but does not work):

df$dummy <- ifelse(df[-(1:2)] %in% c("C","NA"),1,0)

Upvotes: 1

Views: 301

Answers (3)

akrun
akrun

Reputation: 886938

An option using tidyverse

library(tidyverse)
df %>% 
   mutate(dummy = pmap_int(select(., value_1, value_3),
        ~ +(!sum(c(...) != "C", na.rm = TRUE))))
#    state candidate value_1 value_2 value_3 dummy
#1 state         a       A       A       C     0  
#2 state         b       B       B    <NA>     0
#3 state         c       C    <NA>    <NA>     1

Upvotes: 0

Frank
Frank

Reputation: 66819

Another way:

rowSums(df[-(1:2)] != "C", na.rm=TRUE) == 0
# [1] FALSE FALSE  TRUE

How it works:

  • Make a matrix of checks for non-"C" values
  • Count non-"C" values by row, skipping NAs
  • If the count is 0, TRUE; else, FALSE

Confusingly, df[-(1:2)] == "C" yields a matrix, while df[-(1:2)] %in% "C" does not. To handle the latter, wrap as.matrix(df[-(1:2)]) first.

Upvotes: 3

Ronak Shah
Ronak Shah

Reputation: 388807

We can use apply row-wise and check if all the entries in the selected columns is equal to "C", ignoring the NA values.

cols <- grep("^value", names(df))
df$dummy <- as.integer(apply(df[cols] == "C", 1, all, na.rm = TRUE))

df
#  state candidate value_1 value_2 value_3 dummy
#1 state         a       A       A       C     0
#2 state         b       B       B    <NA>     0
#3 state         c       C    <NA>    <NA>     1

As far as your attempt is concerned, %in% will not work on entire dataframe, you need to use sapply/lapply to check for values in multiple columns. In fact you can avoid ifelse here

df$dummy <- as.integer(sapply(df[-c(1:2)], function(x) all(x %in% c(NA, "C"))))

Upvotes: 3

Related Questions