Michael
Michael

Reputation: 23

Find words in between two words in regular expressions

I have a file where I want to match a certain word between keywords using regular expressions. For example, lets say I want to match every occurrence of the word "dog" AFTER the keyword "start" and BEFORE the keyword "end".

dog horse animal cat dog // <-- don't match 
random text dog   // <-- don't match
start 

brown dog
black dog
cat horse animals

end 
dog cat // <-- don't match
good dog    // <-- don't match

Maybe regex has a pipe feature where I can get the text after the word "start" and before the word "end", then pipe it into a new regular expression? Then I could just search for "dog" in the second regular expression. I am new to regular expressions and have been struggling to come up with a solution. Thanks

Upvotes: 1

Views: 3684

Answers (3)

jaytea
jaytea

Reputation: 1949

When you are matching "globally" (ie. collecting several matches that are non-contiguous) and you provide a stipulation such as "matches must all exist in a container" (in this case, between "start" and "end"), this generally calls for a construct such as PCRE's '\G', which matches only at the first attempted position:

(?:\G(?!\A)|start)(?:(?!end).)*?\Kdog

See it in action at: https://regex101.com/r/uV7EjE/1

It's important to note that this uses some constructs that are not universally supported, and one specific to PCRE ('\K'). An explanation of each part:

/(?:
\G(?!\A)        # Match only at the first position, since the usual behaviour of regex is to attempt to match at each position. In effect, this ensures we only match immediately after the last valid "dog".
|start          # Or match "start".
)
(?:(?!end).)*?  # Match as few characters as possible, making sure we don't encounter "end".
\K              # Reset the consumption counter so everything before this isn't matched.
dog             # Match what we want.
/gmsx

If instead you need something with wider support for more basic regex engines, then you do indeed need to pipe a simpler expression, for instance start.*?end to match a complete group, then check its contents for all occurrences of "dog".

Upvotes: 6

oetoni
oetoni

Reputation: 3867

Update:

start(.?)(dog)+(.?)end

Test on the below link, here is a screen:

enter image description here

previous: (please, note this might not answer exactly your case because it heavily depends on what language you are working)

Ref. 1 link Ref. 2 link

It also depends on the language you are developing as the other comments are saying. If you can let me know where are you developing I might give you a better answer.

Also you can use this to debug https://regex101.com/

Upvotes: 1

Bing
Bing

Reputation: 3171

I know you're asking for regex, but if you're using a certain language there may be more apt solutions. For example, in PHP this function would work:

function getStringBetween($string, $start, $end){
    $string = " ".$string;
    $ini = strpos($string,$start);
    if ($ini == 0) return "";
    $ini += strlen($start);
    $len = strpos($string,$end,$ini) - $ini;
    return substr($string,$ini,$len);
}

Upvotes: -1

Related Questions