kenglish95
kenglish95

Reputation: 5

Creating new variable based on a specific result in any of a list of variables

So I want to create a new conditional variable like the one below. Basically, I want a variable that signifies any positive result in a list of other variables. I've been trying to use case_when but having no luck.

variable 1 variable 2 varaible 3 New variable
1 0 0 1
0 0 1 1
0 1 1 1
0 0 0 0

Upvotes: 0

Views: 261

Answers (5)

akrun
akrun

Reputation: 887213

We can use Reduce with | in base R so that any value not equal to 0 will be TRUE for and 0 FALSE which does the elementwise comparison for each row and returns TRUE if there is at least one non-zero, then we coerce the logical to binary with + (TRUE -> 1, FALSE -> 0)

df$new_variable <- +(Reduce(`|`, df))

data

df <- structure(list(variable1 = c(1L, 0L, 0L, 0L), variable2 = c(0L, 
0L, 1L, 0L), variable3 = c(0L, 1L, 1L, 0L)), row.names = c(NA, 
-4L), class = "data.frame")

Upvotes: 1

AnilGoyal
AnilGoyal

Reputation: 26218

using cur_data() in dplyr

library(dplyr)

df %>% mutate(new_v = +(rowSums(cur_data()) > 0))

#>   variable1 variable2 variable3 new_v
#> 1         1         0         0     1
#> 2         0         0         1     1
#> 3         0         1         1     1
#> 4         0         0         0     0

Created on 2021-06-08 by the reprex package (v2.0.0)

Upvotes: 2

Anoushiravan R
Anoushiravan R

Reputation: 21918

I hope I understood what you were looking for correctly. I created new_var variable based on the presence of any positive value in a row:

library(dplyr)

df %>%
  rowwise() %>%
  mutate(new_var = +any(c_across(everything()) > 0, na.rm = TRUE))

# A tibble: 4 x 4
# Rowwise: 
  variable1 variable2 variable3 new_var
      <int>     <int>     <int>   <int>
1         1         0         0       1
2         0         0         1       1
3         0         1         1       1
4         0         0         0       0

Upvotes: 2

Rory S
Rory S

Reputation: 1298

You can use pmap_dbl to apply an if_else statement that checks whether any values of var1, var2 or var3 are positive. This solution works no matter what the numeric values of the above variables are.

library(tidyverse)

# reproduce your data
mydata <- tibble(
  var1 = c(1,0,0,0),
  var2 = c(0,0,1,0),
  var3 = c(0,1,1,0)
)

mydata %>%
  mutate(
     newvar = pmap_dbl(list(var1, var2, var3), ~ if_else(any(c(..1, ..2, ..3) > 0), 1, 0))
  )

Upvotes: 0

Ronak Shah
Ronak Shah

Reputation: 389047

You can find the max value in each row.

df$new_variable <- do.call(pmax, df)
df

#  variable1 variable2 variable3 new_variable
#1         1         0         0            1
#2         0         0         1            1
#3         0         1         1            1
#4         0         0         0            0

data

df <- structure(list(variable1 = c(1L, 0L, 0L, 0L), variable2 = c(0L, 
0L, 1L, 0L), variable3 = c(0L, 1L, 1L, 0L), new_variable = c(1L, 
1L, 1L, 0L)), row.names = c(NA, -4L), class = "data.frame")

Upvotes: 1

Related Questions