BadFrameRate
BadFrameRate

Reputation: 47

Creating a new factor variable from multiple factor variables, all with same levels

Imagine a data frame having multiple factor columns with the same levels, but different entries(maybe coming from a survey).

f1=factor(sample(1:4,10,replace=T))
f2=factor(sample(1:4,10,replace=T))
f3=factor(sample(1:4,10,replace=T))
df=data.frame(id=1:10,f1,f2,f3)

I want to create a new factor variable that takes on a value of 1 if at least two of the three previously defined factors are in levels 1 or 2, a value of 2 if at least two of f1,f2,f3 are in level 3, and a value of 3 if at least two of f1,f2,f3 are in level 4, a value of 4 otherwise(if this case exists?).

I understand it is possible to do so with very deep nesting of if else statements alongside a great arrangement of logical operators. But I was wondering if there is a somewhat more elegant solution using maybe dplyr functions?

Upvotes: 0

Views: 844

Answers (2)

Ronak Shah
Ronak Shah

Reputation: 388982

In dplyr you can specify the conditions in case_when :

library(dplyr)

df %>%
  rowwise() %>%
  mutate(result = {
    vec <- c_across(f1:f3)
    case_when(sum(vec %in% 1:2) >= 2 ~ 1, 
              sum(vec == 3) >= 2 ~ 2, 
              sum(vec == 4) >= 2 ~ 3, 
              TRUE ~ 4)
  })

#      id f1    f2    f3    result
#   <int> <fct> <fct> <fct>  <dbl>
# 1     1 4     2     1          1
# 2     2 1     1     1          1
# 3     3 4     2     2          1
# 4     4 4     3     1          4
# 5     5 2     2     1          1
# 6     6 3     4     2          4
# 7     7 4     2     4          3
# 8     8 3     2     2          1
# 9     9 3     1     1          1
#10    10 2     1     1          1

Upvotes: 1

Allan Cameron
Allan Cameron

Reputation: 173813

Check to see if this works for you:

f1=factor(sample(1:4,10,replace=T))
f2=factor(sample(1:4,10,replace=T))
f3=factor(sample(1:4,10,replace=T))
df=data.frame(id=1:10,f1,f2,f3)

df$f4 <- factor(apply(df[-1], 1, function(x) { 
   y <- which(table(factor(replace(as.numeric(x), x == "2", 1), c(1:2, 4))) > 1)
   if(length(y) == 0) 4 else y
}))

df
#>    id f1 f2 f3 f4
#> 1   1  1  2  4  1
#> 2   2  1  3  2  1
#> 3   3  4  4  2  3
#> 4   4  1  4  2  1
#> 5   5  1  1  1  1
#> 6   6  1  3  3  4
#> 7   7  3  1  3  4
#> 8   8  1  3  4  4
#> 9   9  4  2  1  1
#> 10 10  2  3  3  4

Created on 2020-12-08 by the reprex package (v0.3.0)

Upvotes: 1

Related Questions