Reputation: 41
Here's my text:
"A popular resource for the Christian community in the Asheville area."
"I love the acting community in the Orange County area."
I'd like to capture "Asheville"
and "Orange County"
. How can I start capturing from the closest "the"
to "area"
?
Here's my regex:
/the (.+?) area/
They capture:
"Christian community in the Asheville"
"acting community in the Orange County"
Upvotes: 3
Views: 84
Reputation: 21
(?<=in the)(.*)(?=area)
(?<=) : Look behind command (?=) : Look ahead command, this will exclude the string you type in after the = sign. In this case, 'in the' and 'area' will be excluded from the result.
(.) is used here which is 'greedy', but you can use (.?) to match to the next word typed in the look ahead command.
Upvotes: 2
Reputation: 4981
Use a tempered greedy solution, so that the matching text doesn't contain another the
. That way it'll always match the last the
/the (?:(?!the).)+? area/
(?:(?!the).)+?
represents a tempered greedy dot which matches any character except one that contains the text the
. This is mentioned using the negative lookahead (?!the)
which tells it to not match the text the
. Thus it ensures that the match never contains the text the
the
and area
and so on. Another way would be to make the
and area
as lookbehind and lookahead, though will be a bit slower than a capturing group.Read more about tempered greedy solution and when to use it.
Upvotes: 2
Reputation: 626851
Use a (?:(?!the).)+?
tempered greedy token:
/the ((?:(?!the).)+?) area/
See the regex demo. It is almost the same as /the ([^t]*(?:t(?!he)[^t]*)*?) area/
, but the latter is a bit more efficient since it is an unrolled pattern.
The (?:(?!the).)+?
matches any 1+ chars (as few as possible) that does not start a the
character sequence.
To make it safer, add word boundaries to only match whole words:
/\bthe ((?:(?!\bthe\b).)+?) area\b/
Ruby demo:
s = 'I love the acting community in the Orange County area.'
puts s[/the ((?:(?!the).)+?) area/,1]
# => Orange County
NOTE: if you expect the match to span across multiple lines, do not forget to add /m
modifier:
/the ((?:(?!the).)+?) area/m
^
Upvotes: 2