vintdeux
vintdeux

Reputation: 47

is there an equivalent of the 'match' function in R, that works with regex?

advantage of 'match', it's returning the matching indices from the lexicon disadvantage it doesn't accept regex

Corpus<- c('animalada', 'fe', 'fernandez', 'ladrillo')
Lexicon<- c('animal', 'animalada', 'fe', 'fernandez', 'ladr', 'ladrillo')

Index <- match(Corpus, Lexicon)

match returns the indices of the dictionary

Index
# [1] 2 3 4 6

Lexicon[Index]
# [1] "animalada" "fe" "fernandez" "ladrillo" 

I need to work with a dictionary that includes regex

Lexicon<- c('anima.+$', '.*ez$', '^fe.*$', 'ladr.*$')

problem the 'match' function, doesn't work with regex !

Upvotes: 1

Views: 325

Answers (2)

Jeff
Jeff

Reputation: 724

Following up on @Maël's answer, if you need the actual values from Lexicon returned, here is a useful idiom (provided the relationship between the two lists is one-to-one:

library(stringr)
Lexicon[sapply(Corpus, \(x) str_which(x, Lexicon))]

Upvotes: 0

Ma&#235;l
Ma&#235;l

Reputation: 52069

Use str_which + sapply. Note that one regex can apply to multiple values, hence the list.

library(stringr)
sapply(Corpus, \(x) str_which(x, Lexicon))

# $animalada
# [1] 1
# 
# $fe
# [1] 3
# 
# $fernandez
# [1] 2 3
# 
# $ladrillo
# [1] 4

Upvotes: 2

Related Questions