dyna
dyna

Reputation: 41

Regex that matches 2 words not separated by a 3rd word

I'm trying to create a regex that will match two words (in order) but cannot have another word/characters between them.

I need a match when "Spanish" & "Audio" are not separated by "<br />"

Test String:

Dolby Digital Audio 2.0 Language French<br /> Dolby Digital 5.1 
Audio Language Spanish<br /> Dolby Digital Audio Language 7.1 
English<br /> Subtitles Language Spanish <br />

False positive:

/Audio.*((?!\<br\ \>).).*Spanish/i

What am I doing wrong here?

Upvotes: 1

Views: 137

Answers (1)

KernelPanic
KernelPanic

Reputation: 600

If I'm understanding your question correctly, you'd like to capture one or more words between "Audio" and "Spanish", unless those words contain <br />.

What's the problem?

The first .* matches <br />, and then the negative lookahead matches the space between <br /> and Spanish.

What to do to solve it?

Audio\s*((?:(?!<br\ \/>).)*?)\s*Spanish

Broken down a bit:

Audio
\s*
(                    # the capture group
  (?:
    (?!<br\ \/>).    # any character such that it doesn't begin the string "<br />"
  )*?                # 0+ times; lazy
)
\s*
Spanish

You can see it in action.


The above is an edited post; previous iterations:

Audio\s*((?!\s*\<br\ \/>).*?)\s*Spanish

Thanks to Christian for pointing out that the above would match if <br /> were preceded by non-space characters, e.g. Audio foo <br /> Spanish.

Audio\s*((?!.*\<br\ \/>).*?)\s*Spanish

This was still pretty flawed and failed if there was a trailing <br /> after "Spanish".

Upvotes: 2

Related Questions