ltong
ltong

Reputation: 543

How to check if a value exists within a set of columns?

My dataframe looks something like the following, where there are >100 columns that starts with "i10_" and many other columns with other data. I would like to create a new variable that tells me whether the values C7931 and C7932 are in each row within only the columns that start with "i10_". I would like to create a new variable that states TRUE or FALSE depending on whether the value exists in that row or not.

enter image description here

So the output would be c(TRUE, TRUE, FALSE, FALSE, FALSE, TRUE)

Upvotes: 1

Views: 1340

Answers (3)

Ammar Gamal
Ammar Gamal

Reputation: 211

Similar approach with dplyr::across()

my_eval<-c("C7932","C7931")
d1%>%
mutate(is_it_here=
 rowSums(across(starts_with("i10_"),
 ~. %in% my_eval))!=0)

Upvotes: 1

Sotos
Sotos

Reputation: 51592

Create a vector with the columns of interest and use rowSums(), i.e.

i1 <- grep('i10_', names(d1))
rowSums(d1[i1] == 'C7931' | d1[i1] == 'C7932', na.rm = TRUE) > 0

where,

d1 <- structure(list(v1 = c("A", "B", "C", "D", "E", "F"), i10_a = c(NA, 
"C7931", NA, NA, "S272XXA", "R55"), i10_1 = c("C7931", "C7931", 
"R079", "S272XXA", "S234sfs", "N179")), class = "data.frame", row.names = c(NA, 
-6L))

Upvotes: 1

brendbech
brendbech

Reputation: 419

Ideally, you would give us a reproducible example with dput(). Assuming your dataframe is called df, you can do something like this with only base.

df$present <- apply(
  df[, c(substr(names(df), 1, 3) == "i10")],
  MARGIN = 1,
  FUN = function(x){"C7931" %in% x & "C7932" %in% x})

This will go row by row and check columns that start with i10 if they contain "C7931" and "C7932".

Upvotes: 1

Related Questions