Reputation: 57
I have two character vectors x and y, the former comprising (potential) sub strings of the latter, and both containing duplicate values. I want to return the index of the first match (if present) in y for each element in x, where the sub string is matched at the beginning of the string (cf. ^ anchor in regex), e.g:
x <- c("Halimid", "Halimid", "Callimid", "Diplid", "Halimid", "Cyathid")
y <- c("Bathymidae", "Bathymidae", "Halimidopidae", "Cyathidae", "Bothridae", "Cyathidae", "Diplididae", "Holothuridae")
some function(first match for each element of x in y if there is a match)
2, 2, NA, 7, 2, 4
i.e the function should return a vector of same length as x, containing the indices of the first match in y, or NA for elements without a match. I've already tried base::startsWith()
, but it only works for a single substring and pmatch()
hasn't worked for me either. I want to avoid apply and loops if possible so vectorized solutions preferred
Upvotes: 0
Views: 467
Reputation: 11548
Using traditional for loop:
v <- NULL
for(chr in x){
v <- c(v,grep(chr, y)[1])
}
v
[1] 3 3 NA 7 3 4
Upvotes: 0
Reputation: 6663
I can’t think of a solution without lapply()
or purrr::map()
, not sure
if those are acceptable for you, but they are quite simple, so here we go:
x <- c("Halimid", "Halimid", "Callimid", "Diplid", "Halimid", "Cyathid")
y <- c("Bathymidae", "Bathymidae", "Halimidopidae", "Cyathidae", "Bothridae", "Cyathidae", "Diplididae", "Holothuridae")
Using lapply()
and grep()
.
a <- lapply(x, function(z) grep(z, y)[1])
unlist(a)
#> [1] 3 3 NA 7 3 4
Using map_dbl()
we can make the code appear a bit more simple, but it is
essentially the same.
library(purrr)
map_dbl(x, ~grep(., y)[1])
#> [1] 3 3 NA 7 3 4
Created on 2020-11-02 by the reprex package (v0.3.0)
Upvotes: 1