SDahm
SDahm

Reputation: 436

Create a new variable in data frame depending on two other variables

I have a large data frame and want to create a new variable which depends on two other variables.

Here is a short example:

v1 <- rep(c(1:5),each=3)
v2 <- c('X','A','Y','X','Y','B','X','Y','C','X','Y','C','X','Y','A')

dat <- data.frame(v1,v2)

#create a new var which contains either A,B, or C depending on what is found in v2  


#desired output
v3 <- rep(c('A','B','C','C','A'),each=3)
data.frame(v1,v2,v3)

Any ideas on how to do this with a short code?

I tried this, but it's far from the solution. Too many missings. :(

dat$v3[dat$v2 %in% c('A','B','C')] <- dat$v2[dat$v2 %in% c('A','B','C')]

Upvotes: 0

Views: 129

Answers (1)

Julius Vainora
Julius Vainora

Reputation: 48191

library(tidyverse)
dat %>% group_by(v1) %>% mutate(v3 = intersect(v2, c("A", "B", "C")))
# A tibble: 15 x 3
# Groups:   v1 [5]
#       v1 v2    v3   
#    <int> <fct> <chr>
#  1     1 X     A    
#  2     1 A     A    
#  3     1 Y     A    
#  4     2 X     B    
#  5     2 Y     B    
#  6     2 B     B    
#  7     3 X     C    
#  8     3 Y     C    
#  9     3 C     C    
# 10     4 X     C    
# 11     4 Y     C    
# 12     4 C     C    
# 13     5 X     A    
# 14     5 Y     A    
# 15     5 A     A    

This is assuming that only one of A, B, C can appear in a group given by v1.

Upvotes: 2

Related Questions