Reputation: 23
I have a file where I want to match a certain word between keywords using regular expressions. For example, lets say I want to match every occurrence of the word "dog" AFTER the keyword "start" and BEFORE the keyword "end".
dog horse animal cat dog // <-- don't match
random text dog // <-- don't match
start
brown dog
black dog
cat horse animals
end
dog cat // <-- don't match
good dog // <-- don't match
Maybe regex has a pipe feature where I can get the text after the word "start" and before the word "end", then pipe it into a new regular expression? Then I could just search for "dog" in the second regular expression. I am new to regular expressions and have been struggling to come up with a solution. Thanks
Upvotes: 1
Views: 3684
Reputation: 1949
When you are matching "globally" (ie. collecting several matches that are non-contiguous) and you provide a stipulation such as "matches must all exist in a container" (in this case, between "start" and "end"), this generally calls for a construct such as PCRE's '\G', which matches only at the first attempted position:
(?:\G(?!\A)|start)(?:(?!end).)*?\Kdog
See it in action at: https://regex101.com/r/uV7EjE/1
It's important to note that this uses some constructs that are not universally supported, and one specific to PCRE ('\K'). An explanation of each part:
/(?:
\G(?!\A) # Match only at the first position, since the usual behaviour of regex is to attempt to match at each position. In effect, this ensures we only match immediately after the last valid "dog".
|start # Or match "start".
)
(?:(?!end).)*? # Match as few characters as possible, making sure we don't encounter "end".
\K # Reset the consumption counter so everything before this isn't matched.
dog # Match what we want.
/gmsx
If instead you need something with wider support for more basic regex engines, then you do indeed need to pipe a simpler expression, for instance start.*?end
to match a complete group, then check its contents for all occurrences of "dog".
Upvotes: 6
Reputation: 3867
Update:
start(.?)(dog)+(.?)end
Test on the below link, here is a screen:
previous: (please, note this might not answer exactly your case because it heavily depends on what language you are working)
It also depends on the language you are developing as the other comments are saying. If you can let me know where are you developing I might give you a better answer.
Also you can use this to debug https://regex101.com/
Upvotes: 1
Reputation: 3171
I know you're asking for regex, but if you're using a certain language there may be more apt solutions. For example, in PHP this function would work:
function getStringBetween($string, $start, $end){
$string = " ".$string;
$ini = strpos($string,$start);
if ($ini == 0) return "";
$ini += strlen($start);
$len = strpos($string,$end,$ini) - $ini;
return substr($string,$ini,$len);
}
Upvotes: -1