Reputation: 59
I have a data set looks like, dat. I want to create one new variable (newVar) with values 1 through 4 using the Q1 through Q6 in R. Can anyone help how to do this, please?
newVar of 1: If any variable in Q1 through Q6 = "A"
newVar of 2: **If none of variables in Q1 through Q6 = "A", and** If any variable in Q59_1 through Q6= "B"
newVar of 3: **If none of variables in Q1 through Q6 = "A" or "B", and** If any variable in Q1 through Q6= "C"
newVar of 4: If none of variables in Q1 through Q6 = "A" or "B" or "C" , and**strong text** If any variable in Q1 through Q6= "D"
dat = data.frame(Q1 = c("A","D","B","A","D","C","B"),
Q2 = c("C","A","B","A","C","C","B"),
Q3 = c("B","D","D","A","B","C","A"),
Q4 = c("A","C","A","D","B","D","B"),
Q5 = c("C","D","D","A","A","C","B"),
Q6 = c("D","B","A","A","C","C","A"),
Q7 = c("5","1","2","2","4","0","3"),
Q8 = c("T","U","O","O","Z","W","Y"),
Q9 = c("X","T","V","P","I","U","Y"))
Additional comment: I actually need all the conditions especially the bold parts (If none of variables in ...) such as NewVar 2: If none of variables in Q1 through Q6 = "A", and If any variable in Q59_1 through Q6= "B" 3: If none of variables in Q1 through Q6 = "A" or "B", and If any variable in Q1 through Q6= "C" 4: If none of variables in Q1 through Q6 = "A" or "B" or "C" , and If any variable in Q1 through Q6= "D"
Upvotes: 0
Views: 674
Reputation: 39647
You can use match
with LETTERS
and min
to test if any of the LETTERS is present.
dat$newVar <- apply(dat[1:6], 1, function(x) min(match(x, LETTERS[1:4])))
dat
# Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 newVar
#1 A C B A C D 5 T X 1
#2 D A D C D B 1 U T 1
#3 B B D A D A 2 O V 1
#4 A A A D A A 2 O P 1
#5 D C B B A C 4 Z I 1
#6 C C C D C C 0 W U 3
#7 B B A B B A 3 Y Y 1
Upvotes: 1
Reputation: 388817
In base R, you can use apply
row-wise on selected columns and assign value to newVar
checking conditions one by one.
cols <- paste0('Q', 1:6)
dat$newVar <- apply(dat[cols], 1, function(x) {
if(any(x == 'A')) 1
else if(any(x == 'B')) 2
else if(any(x == 'C')) 3
else if(any(x == 'D')) 4
else NA
})
dat
# Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 newVar
#1 A C B A C D 5 T X 1
#2 D A D C D B 1 U T 1
#3 B B D A D A 2 O V 1
#4 A A A D A A 2 O P 1
#5 D C B B A C 4 Z I 1
#6 C C C D C C 0 W U 3
#7 B B A B B A 3 Y Y 1
Upvotes: 0
Reputation: 16978
Using packages tidyr
and dplyr
:
dat %>%
mutate(id = row_number()) %>%
pivot_longer(cols=starts_with("Q"), names_to="Q", names_prefix="Q") %>%
group_by(id) %>%
mutate(newVar=case_when(Q %in% 1:6 & any(value == "A") ~ 1,
Q %in% 1:6 & any(value == "B") ~ 2,
Q %in% 1:6 & any(value == "C") ~ 3,
Q %in% 1:6 & any(value == "D") ~ 4)) %>%
pivot_wider(names_from="Q", names_prefix="Q", names_sep="") %>%
fill(everything(), .direction="down") %>%
slice(2)
I'm not sure if I got your conditions right. And there are surely more tidy ways, but this one yields
# A tibble: 7 x 11
# Groups: id [7]
id newVar Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9
<int> <dbl> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 1 1 A C B A C D 5 T X
2 2 1 D A D C D B 1 U T
3 3 1 B B D A D A 2 O V
4 4 1 A A A D A A 2 O P
5 5 1 D C B B A C 4 Z I
6 6 3 C C C D C C 0 W U
7 7 1 B B A B B A 3 Y Y
Upvotes: 1