Reputation: 55
I have a beginner R user:
This is my dataset
factor1 <- c(1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6, 7, 7, 8,8,9, 9, 10, 10)
factor2 <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15,16,17, 18, 19, 20)
factor3 <- c("a", "a", "a", "a", "a", "b", "b", "b", "b", "b", "c", "c", "c", "c", "c", "d", "d", "d", "d", "d")
factor4 <- c(10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150,160,170, 180, 190, NA)
dataset <- data.frame(factor1, factor2, factor3, factor4)
I created a new variable this way:
dataset$newvar <-"NA"
How to do the following:
I want newvar to take the value 1 if factor1>=5 and factor2<19 and (factor3="b" or factor3="c") and factor4 is different from missing and newvar is equal to missing
Ideally I want to specify different conditions, so some observations will be value 1, 2, 3 and 4 in the variable newvar dependent on the values of several other variables.
This is very simple and intuitive in STATA and would like to know if there is a simple and intuitive way to do the same in R.
Upvotes: 3
Views: 40189
Reputation: 181
Generate a new variable based on several conditions for several values.
This bit of the question was not explicitly addressed:
Ideally I want to specify different conditions, so some observations will be value 1, 2, 3 and 4 in the variable newvar dependent on the values of several other variables.
A simple solution would be to use case_when
. Similar to Stata's recode
it allows you to specify several values simultaneously.
It works the following way:
newvar = case_when(
condition1 ~ target value,
condition2 ~ target value)
e.g. var1 == 1 ~ 0
Important you need a ,
after each line.
library(dplyr)
dataset <- mutate(dataset,
newvar = case_when(
factor1 >= 5 & factor2<19 & (factor3 =="b" | factor3 =="c") ~ 1,
factor1 == 1 ~ 2,
factor1 == 2 ~ 3,
TRUE ~ NA_real_ # This is for all other values
)) # not covered by the above.
dataset
# factor1 factor2 factor3 factor4 newvar
# 1 1 1 a 10 2
# 2 1 2 a 20 2
# 3 2 3 a 30 3
# 4 2 4 a 40 3
# 5 3 5 a 50 NA
# 6 3 6 b 60 NA
# 7 4 7 b 70 NA
# 8 4 8 b 80 NA
# 9 5 9 b 90 1
# 10 5 10 b 100 1
# 11 6 11 c 110 1
# 12 6 12 c 120 1
# 13 7 13 c 130 1
# 14 7 14 c 140 1
# 15 8 15 c 150 1
# 16 8 16 d 160 NA
# 17 9 17 d 170 NA
# 18 9 18 d 180 NA
# 19 10 19 d 190 NA
# 20 10 20 d NA NA
Note, you can not use NA
(missing) as a target value, instead use one of the following
NA_character_
NA_real_
NA_complex_
NA_double_
Upvotes: 4
Reputation: 83275
In base R you can just do (promoting my comment to an answer):
dataset$newvar <- NA
dataset[dataset$factor1 >= 5 & dataset$factor2 < 19 & (dataset$factor3=="b" | dataset$factor3 =="c"), "newvar"] <- 1
or:
dataset$newvar <- NA
indx <- dataset$factor1 >= 5 & dataset$factor2 < 19 & (dataset$factor3=="b" | dataset$factor3 =="c") & !is.na(dataset$factor4)
dataset[indx, "newvar"] <- 1
Upvotes: 2
Reputation: 7816
Using dplyr
library(dplyr)
dataset %>%
mutate(newvar = ifelse(factor1 > 5 &
factor2 < 19 &
(factor3=="b" | factor3=="c") &
!is.na(factor4), 1, NA))
Upvotes: 0