EN20
EN20

Reputation: 53

Regex to find a line with two capture groups that match the same regex but are still different

I am trying to analyse my source code (written in C) for not corresponding timer variable comparisons/allocations. I have a rage of timers with different timebases (2-250 milliseconds). Every timer variable contains its granularity in milliseconds in its name (e.g. timer10ms) as well as every timer-photo and define (e.g. fooTimer10ms, DOO_TIMEOUT_100MS).

Here are some example lines:

fooTimer10ms = timer10ms;
baaTimer20ms = timer10ms;
if (DIFF_100MS(dooTimer10ms) >= DOO_TIMEOUT_100MS)
if (DIFF_100MS(dooTimer10ms) < DOO_TIMEOUT_100MS)

I want to match those line where the timebases are not corresponding (in this case the second, third and fourth line). So far I have this regex:

(\d{1,3}(?i)ms(?-i)).*[^\d](\d{1,3}(?i)ms(?-i))

that is capable of finding every line where there are two of those granularities. So instead of just line 2, 3 and 4 it matches all of them. The only idea I had to narrow it down is to add a negative lookbehind with a back-reference, like so:

(\d{1,3}(?i)ms(?-i)).*[^\d](\d{1,3}(?i)ms(?-i))(?<!\1)

but this is not allowed because a negative lookbehind has to have a fixed length.

I found these two questions (one, two) but the fist does not have the restriction of having both capture groups being of the same kind and the second is looking for equal instances of the capture group.

If what I want can be achieved way easier, by using something else than regex, I would be happy to know. My mind is just stuck due to my believe that regex is capable of that and I am just not creative enough to use it properly.

Upvotes: 0

Views: 221

Answers (1)

The fourth bird
The fourth bird

Reputation: 163427

One option is to match the timer part followed by the digits and use a negative lookahead with a backreference to assert that it does not occur at the right.

For the example data, a bit specific pattern using a range from 2-250 might be:

.*?(timer(?:2[0-4]\d|250|1?\d\d|[2-9])ms)\b\S*[^\S\r\n]*[<>]?=[^\S\r\n]*\b(?!\S*\1)\S+

The pattern matches

  • .*? Match any char except a newline, as least as possible (Non greedy)
  • ( Capture group 1
    • timer Match literally
    • (?:2[0-4]\d|250|1?\d\d|[2-9]) Match a digit in the range of 2-250
    • ms Match literally
  • )\b Close group and a word boundary
  • \S*[^\S\r\n]* Match optional non whitespace chars and optional spaces without newlines
  • [<>]?= Match an optional < or > and =
  • [^\S\r\n]*\b Match optional whitespace chars without a newline and a word boundary
  • (?!\S*\1) Negative lookahead, assert no occurrence of what is captured in group 1 in the value
  • \S+ Match 1+ non whitespace chars

Regex demo

Or perhaps a broader pattern matching 1-3 digits and optional whitespace chars which might also match a newline:

.*?(timer\d{1,3}ms\b)\S*\s*[<>]?=\s*\b(?!.*\1)\S+

Regex demo

Note that {1-3} should be {1,3} and could also match 999

Upvotes: 2

Related Questions