Reputation: 543
My dataframe looks something like the following, where there are >100 columns that starts with "i10_" and many other columns with other data. I would like to create a new variable that tells me whether the values C7931 and C7932 are in each row within only the columns that start with "i10_". I would like to create a new variable that states TRUE or FALSE depending on whether the value exists in that row or not.
So the output would be c(TRUE, TRUE, FALSE, FALSE, FALSE, TRUE)
Upvotes: 1
Views: 1340
Reputation: 211
Similar approach with dplyr::across()
my_eval<-c("C7932","C7931")
d1%>%
mutate(is_it_here=
rowSums(across(starts_with("i10_"),
~. %in% my_eval))!=0)
Upvotes: 1
Reputation: 51592
Create a vector with the columns of interest and use rowSums()
, i.e.
i1 <- grep('i10_', names(d1))
rowSums(d1[i1] == 'C7931' | d1[i1] == 'C7932', na.rm = TRUE) > 0
where,
d1 <- structure(list(v1 = c("A", "B", "C", "D", "E", "F"), i10_a = c(NA,
"C7931", NA, NA, "S272XXA", "R55"), i10_1 = c("C7931", "C7931",
"R079", "S272XXA", "S234sfs", "N179")), class = "data.frame", row.names = c(NA,
-6L))
Upvotes: 1
Reputation: 419
Ideally, you would give us a reproducible example with dput()
. Assuming your dataframe is called df
, you can do something like this with only base
.
df$present <- apply(
df[, c(substr(names(df), 1, 3) == "i10")],
MARGIN = 1,
FUN = function(x){"C7931" %in% x & "C7932" %in% x})
This will go row by row and check columns that start with i10 if they contain "C7931" and "C7932".
Upvotes: 1