Reputation: 343
Assuming I have 2 array of strings position1 = ['word1', 'word2', 'word3'] position2 = ['word4', 'word1']
and I want inside a text/string to check if the substring #{target} which exists in text is followed by either one of the words of position1 or following one of the words of the position2 or even both at the same time. Similarly as if I am looking left and right of #{target}.
For example in the sentence "Writing reports and inputting data onto internal systems, with regards to enforcement and immigration papers" if the target word is data I would like to check if the word left (inputting) and right (onto) are included in the arrays or if one of the words in the arrays return true for the regex match. Any suggestions? I am using Ruby and I have tried some regex but I can't make it work yet. I also have to ignore any potential special characters in between.
One of them:
/^.*\b(#{joined_position1})\b.*$[\s,.:-_]*\b#{target}\b[\s,.:-_\\\/]*^.*\b(#{joined_position2})\b.*$/i
Edit:
I figured out this way with regex to capture the word left and right:
(\S+)\s*#{target}\s*(\S+)
However what could I change if I would like to capture more than one words left and right?
Upvotes: 0
Views: 251
Reputation: 4874
If you have two arrays of strings, what you can do is something like this:
matches = /^.+ (\S+) #{target} (\S+) .+$/.match(text)
if matches and (position1.include?(matches[1]) or position2.include?(matches[2]))
do_something()
end
What this regex does is match the target word in your text and extract the words next to it using capture groups. The code then compares those words against your arrays, and does something if they're in the right places. A more general version of this might look like:
def checkWords(target, text, leftArray, rightArray, numLeft = 1, numRight = 1)
# Build the regex
regex = "^.+"
regex += " (\S+)" * numLeft
regex += " #{target}"
regex += " (\S+)" * numRight
regex += " .+$"
pattern = Regexp.new(regex)
matches = pattern.match(text)
return false if !matches
for i in 1..numLeft
return false if (!leftArray.include?(matches[i]))
end
for i in 1..numRight
return false if (!rightArray.include?(matches[numLeft + i]))
end
return true
end
Which can then be invoked like this:
do_something() if checkWords("data", text, position1, position2, 2, 2)
I'm pretty sure it's not terribly idiomatic, but it gives you a general sense of how you would do what you in a more general way.
Upvotes: 1