R: By group, test if for each value of one variable, that value exists in another variable

Question

I have a data frame structured something like:

a <- c(1,1,1,2,2,2,3,3,3,3,4,4)
b <- c(1,2,3,1,2,3,1,2,3,4,1,2)
c <- c(NA, NA, 2, NA, 1, 1, NA, NA, 1, 1, NA, NA)

df <- data.frame(a,b,c)

Where a and b uniquely identify an observation. I want to create a new variable, d, which indicates if each observation's value for b is present at least once in c as grouped by a. Such that d would be:

[1] 0 1 0 1 0 0 1 0 0 0 0 0

I can write a for loop which will do the trick,

attach(df)
for (i in unique(a)) {
  for (j in b[a == i]) {
    df$d[a == i & b == j] <- ifelse(j %in% c[a == i], 1, 0)
  }
}

But surely in R there must be a cleaner/faster way of achieving the same result?

MichaelChirico · Accepted Answer

Using data.table:

library(data.table)
setDT(df) #convert df to a data.table without copying
# +() is code golf for as.integer
df[ , d := +(b %in% c), by = a]
#     a b  c d
#  1: 1 1 NA 0
#  2: 1 2 NA 1
#  3: 1 3  2 0
#  4: 2 1 NA 1
#  5: 2 2  1 0
#  6: 2 3  1 0
#  7: 3 1 NA 1
#  8: 3 2 NA 0
#  9: 3 3  1 0
# 10: 3 4  1 0
# 11: 4 1 NA 0
# 12: 4 2 NA 0

Adding the dplyr version for those of that persuasion. All credit due to @akrun.

library(dplyr)
df %>% group_by(a) %>% mutate(d = +(b %in% c))

And for posterity, a base R version as well (via @thelatemail below)

df <- df[order(df$a, df$b), ]
df$d <- unlist(by(df, df$a, FUN = function(x) (x$b %in% x$c) + 0L ))

R: By group, test if for each value of one variable, that value exists in another variable

Answers (2)

Related Questions