Reputation: 47
Imagine a data frame having multiple factor columns with the same levels, but different entries(maybe coming from a survey).
f1=factor(sample(1:4,10,replace=T))
f2=factor(sample(1:4,10,replace=T))
f3=factor(sample(1:4,10,replace=T))
df=data.frame(id=1:10,f1,f2,f3)
I want to create a new factor variable that takes on a value of 1 if at least two of the three previously defined factors are in levels 1 or 2, a value of 2 if at least two of f1
,f2
,f3
are in level 3, and a value of 3 if at least two of f1
,f2
,f3
are in level 4, a value of 4 otherwise(if this case exists?).
I understand it is possible to do so with very deep nesting of if
else
statements alongside a great arrangement of logical operators. But I was wondering if there is a somewhat more elegant solution using maybe dplyr
functions?
Upvotes: 0
Views: 844
Reputation: 388982
In dplyr
you can specify the conditions in case_when
:
library(dplyr)
df %>%
rowwise() %>%
mutate(result = {
vec <- c_across(f1:f3)
case_when(sum(vec %in% 1:2) >= 2 ~ 1,
sum(vec == 3) >= 2 ~ 2,
sum(vec == 4) >= 2 ~ 3,
TRUE ~ 4)
})
# id f1 f2 f3 result
# <int> <fct> <fct> <fct> <dbl>
# 1 1 4 2 1 1
# 2 2 1 1 1 1
# 3 3 4 2 2 1
# 4 4 4 3 1 4
# 5 5 2 2 1 1
# 6 6 3 4 2 4
# 7 7 4 2 4 3
# 8 8 3 2 2 1
# 9 9 3 1 1 1
#10 10 2 1 1 1
Upvotes: 1
Reputation: 173813
Check to see if this works for you:
f1=factor(sample(1:4,10,replace=T))
f2=factor(sample(1:4,10,replace=T))
f3=factor(sample(1:4,10,replace=T))
df=data.frame(id=1:10,f1,f2,f3)
df$f4 <- factor(apply(df[-1], 1, function(x) {
y <- which(table(factor(replace(as.numeric(x), x == "2", 1), c(1:2, 4))) > 1)
if(length(y) == 0) 4 else y
}))
df
#> id f1 f2 f3 f4
#> 1 1 1 2 4 1
#> 2 2 1 3 2 1
#> 3 3 4 4 2 3
#> 4 4 1 4 2 1
#> 5 5 1 1 1 1
#> 6 6 1 3 3 4
#> 7 7 3 1 3 4
#> 8 8 1 3 4 4
#> 9 9 4 2 1 1
#> 10 10 2 3 3 4
Created on 2020-12-08 by the reprex package (v0.3.0)
Upvotes: 1