Undistraction
Undistraction

Reputation: 43351

Restrict part of pattern based on first character or previous occurance of character

I have the following regexp:

/^(?:(?:>|<)?[a-zA-Z]+(?:(?:\+|-)\d*\.?\d*(?:em)?)?)(?:<[a-zA-Z]+(?:(?:\+|-)\d*\.?\d*(?:em)?)?)?$/

Which you can think about like this:

^
  (?:
    (?:>|<)?
    [a-zA-Z]+
    (?:(?:\+|-)\d*\.?\d*(?:em)?)?
  )
  (?:
    <
    [a-zA-Z]+
    (?:(?:\+|-)\d*\.?\d*(?:em)?)?
  )?
$

It is effectively the same pattern repeated once or twice with a small difference. The core of each pattern is one or more letter [a-zA-Z] followed by an optional minus or plus and a numeric value possibly followed by em. The first instance can start with either < or > and the second instance can only start with <.

So the following are all valid:

  `alpha`,
  `alphaBravo`,
  `alphaBravoCharlie`,
  `>alpha`,
  `<alpha`,
  `>alpha+10`,
  `<alpha+10`,
  `>alpha+1.5`,
  `<alpha+1.5`,
  `>alpha-10`,
  `>alpha-10`,
  `>alpha-1.5`,
  `>alpha-1.5`,
  `>alpha+10em`,
  `<alpha+10em`,
  `>alpha+1.5em`,
  `<alpha+1.5em`,
  `>alpha-1.5em`,
  `>alpha-1.5em`,
  `alpha-50em<delta-100em`,
  `alpha-50em<delta+100em`,
  `>alpha-50em<delta+100em`,

My problem is that if the first instance starts with a < then the second instance shouldn't be allowed, so the following should be invalid:

<alpha<bravo

Is it possible to add this restriction to the regexp?

The two approaches I can think of are:

  1. Check the first character and make the second instance invalid if it is <
  2. Check if < has already ocurred in the string (or if < occurs again in the string) and if so, make the second instance invalid.

However I'm not sure how to implement either of these approaches here.

Upvotes: 2

Views: 71

Answers (2)

revo
revo

Reputation: 48721

You could use a very early negative lookahead right after caret ^:

(?!<[^<\s]*<)

Live demo

You also don't need to use alternations to match a single character at a time i.e. (?:>|<) should be [<>] or (?:\+|-) should be [+-].

Extended mode:

^
  (?!<[^<\s]*<) # We have this extra one
  (?:
    [<>]?
    [a-zA-Z]+
    (?:[-+]\d+(?:\.\d+)?(?:em)?)?
  )
  (?:
    <
    [a-zA-Z]+
    (?:[-+]\d+(?:\.\d+)?(?:em)?)?
  )?
$

In a line:

^(?!<[^<\s]*<)(?:[<>]?[a-zA-Z]+(?:[-+]\d+(?:\.\d+)?(?:em)?)?)(?:<[a-zA-Z]+(?:[-+]\d+(?:\.\d+)?(?:em)?)?)?$

Upvotes: 3

Ωmega
Ωmega

Reputation: 43673

Just replace (?:(?:>|<)? with (?:(?:>|<(?!.*<))? to get desired results.

Test it here.


If you want to extend this feature from < character to > character as well, you can replace same part of the pattern (?:(?:>|<)? with (?:([<>])(?!.*\1))? and replace <? with [<>]? in the second part of your pattern.

Test it here.

Upvotes: 2

Related Questions