Xavier
Xavier

Reputation: 55

R code: how to generate variable based on multiple conditions from other variables

I have a beginner R user:

This is my dataset

factor1 <- c(1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6, 7, 7, 8,8,9, 9, 10, 10)
factor2 <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15,16,17, 18, 19, 20)
factor3 <- c("a", "a", "a", "a", "a", "b", "b", "b", "b", "b", "c", "c", "c", "c", "c", "d", "d", "d", "d", "d")
factor4 <- c(10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150,160,170, 180, 190, NA)
dataset <- data.frame(factor1, factor2, factor3, factor4) 

I created a new variable this way:

dataset$newvar <-"NA"

How to do the following:

I want newvar to take the value 1 if factor1>=5 and factor2<19 and (factor3="b" or factor3="c") and factor4 is different from missing and newvar is equal to missing

Ideally I want to specify different conditions, so some observations will be value 1, 2, 3 and 4 in the variable newvar dependent on the values of several other variables.

This is very simple and intuitive in STATA and would like to know if there is a simple and intuitive way to do the same in R.

Upvotes: 3

Views: 40189

Answers (3)

Tfsnuff
Tfsnuff

Reputation: 181

Generate a new variable based on several conditions for several values.

This bit of the question was not explicitly addressed:

Ideally I want to specify different conditions, so some observations will be value 1, 2, 3 and 4 in the variable newvar dependent on the values of several other variables.

A simple solution would be to use case_when. Similar to Stata's recode it allows you to specify several values simultaneously.

It works the following way:

newvar = case_when(
condition1 ~ target value,
condition2 ~ target value)

e.g. var1 == 1 ~ 0

Important you need a , after each line.

library(dplyr)

dataset <- mutate(dataset,
        newvar = case_when(
               factor1 >= 5 & factor2<19 & (factor3 =="b" | factor3 =="c")  ~ 1, 
               factor1 == 1 ~ 2,
               factor1 == 2 ~ 3,
               TRUE ~ NA_real_ # This is for all other values 
             ))                # not covered by the above.

dataset


#       factor1 factor2 factor3 factor4 newvar
# 1        1       1       a      10      2
# 2        1       2       a      20      2
# 3        2       3       a      30      3
# 4        2       4       a      40      3
# 5        3       5       a      50     NA
# 6        3       6       b      60     NA
# 7        4       7       b      70     NA
# 8        4       8       b      80     NA
# 9        5       9       b      90      1
# 10       5      10       b     100      1
# 11       6      11       c     110      1
# 12       6      12       c     120      1
# 13       7      13       c     130      1
# 14       7      14       c     140      1
# 15       8      15       c     150      1
# 16       8      16       d     160     NA
# 17       9      17       d     170     NA
# 18       9      18       d     180     NA
# 19      10      19       d     190     NA
# 20      10      20       d      NA     NA

Note, you can not use NA (missing) as a target value, instead use one of the following

  • NA_character_
  • NA_real_
  • NA_complex_
  • NA_double_

Upvotes: 4

Jaap
Jaap

Reputation: 83275

In base R you can just do (promoting my comment to an answer):

dataset$newvar <- NA
dataset[dataset$factor1 >= 5 & dataset$factor2 < 19 & (dataset$factor3=="b" | dataset$factor3 =="c"), "newvar"] <- 1

or:

dataset$newvar <- NA
indx <- dataset$factor1 >= 5 & dataset$factor2 < 19 & (dataset$factor3=="b" | dataset$factor3 =="c") & !is.na(dataset$factor4)
dataset[indx, "newvar"] <- 1

Upvotes: 2

C_Z_
C_Z_

Reputation: 7816

Using dplyr

library(dplyr)

dataset %>%
  mutate(newvar = ifelse(factor1 > 5 & 
                         factor2 < 19 & 
                         (factor3=="b" | factor3=="c") & 
                         !is.na(factor4), 1, NA))

Upvotes: 0

Related Questions