Reputation: 47
advantage of 'match', it's returning the matching indices from the lexicon disadvantage it doesn't accept regex
Corpus<- c('animalada', 'fe', 'fernandez', 'ladrillo')
Lexicon<- c('animal', 'animalada', 'fe', 'fernandez', 'ladr', 'ladrillo')
Index <- match(Corpus, Lexicon)
match
returns the indices of the dictionary
Index
# [1] 2 3 4 6
Lexicon[Index]
# [1] "animalada" "fe" "fernandez" "ladrillo"
I need to work with a dictionary that includes regex
Lexicon<- c('anima.+$', '.*ez$', '^fe.*$', 'ladr.*$')
problem the 'match' function, doesn't work with regex !
Upvotes: 1
Views: 325
Reputation: 724
Following up on @Maël's answer, if you need the actual values from Lexicon returned, here is a useful idiom (provided the relationship between the two lists is one-to-one:
library(stringr)
Lexicon[sapply(Corpus, \(x) str_which(x, Lexicon))]
Upvotes: 0
Reputation: 52069
Use str_which
+ sapply
. Note that one regex can apply to multiple values, hence the list.
library(stringr)
sapply(Corpus, \(x) str_which(x, Lexicon))
# $animalada
# [1] 1
#
# $fe
# [1] 3
#
# $fernandez
# [1] 2 3
#
# $ladrillo
# [1] 4
Upvotes: 2