WinterMute
WinterMute

Reputation: 195

Regex matching multiple groups

I am very new to Regex and trying to create filter rule to get some matches. For Instance, I have query result like this:

application_outbound_api_external_metrics_service_plus_success_total
application_outbound_api_external_metrics_service_plus_failure_total
application_inbound_api_metrics_service_success_total
application_inbound_api_metrics_service_failure_total

Now I want to filter ONLY lines which contains "outbound" AND "service_plus" AND "failure".

I tried to play with groups, but how can I create an regex, but somwhere I am misundersteanding this which contains in wrong results.

Regex which I used:

/(?:outbound)|(?:service_plus)|(?:failure)/

Upvotes: 0

Views: 108

Answers (2)

MonkeyZeus
MonkeyZeus

Reputation: 20747

You need to make use of lookaheads to assert that multiple things need to exist regardless of the order they exist:

^(?=.*(?:^|_)outbound(?:_|$))(?=.*(?:^|_)service_plus(?:_|$))(?=.*(?:^|_)failure(?:_|$)).+$
  • ^ - start line anchor
  • (?= - open the positive lookahead aka "ahead of me is..."
    • .* - optionally anything
    • (?:^|_) - start line anchor or underscore
    • outbound - the word "outbound"
    • (?:_|$) - underscore or end line anchor
    • The underscores and line anchors ensure we don't have false positives like "outbounds" or "goutbound"
  • ) - close the positive lookahead
  • Rinse and repeat for "service_plus" and "failure"
  • Since we haven't captured any chars yet, the second and third lookaheads allow for searching the terms in any order
  • .+$ - capture everything till the end of the line

https://regex101.com/r/Zhl4Mf/1


If the order does matter then build a regex in the correct order:

^.*_outbound_.*_service_plus_failure_.*$

https://regex101.com/r/b7O5YK/1

Upvotes: 1

Booboo
Booboo

Reputation: 44303

You should use multiple lookahead assertions:

^(?=.*outbound)(?=.*service_plus)(?=.*failure).*\n?

The above should use the MULTILINE flag so that ^ is interpreted as start of string or start of line.

  1. ^ - matches start of string or start of line.
  2. (?=.*outbound) - asserts that at the current position we can match 0 or more non-newline characters followed by 'outbound` without consuming any characters (i.e. the scan position is not advanced)
  3. (?=.*service_plus) - asserts that at the current position we can match 0 or more non-newline characters followed by 'service_plus` without consuming any characters (i.e. the scan position is not advanced)
  4. (?=.*failure) - asserts that at the current position we can match 0 or more non-newline characters followed by 'failure` without consuming any characters (i.e. the scan position is not advanced)
  5. .*\n? - matches 0 or more non-line characters optionally followed by a newline (in case the final line does not terminate in a newline character)

See RegEx Demo

In Python, for example:

import re

lines = """application_outbound_api_external_metrics_service_plus_success_total
application_outbound_api_external_metrics_service_plus_failure_total
application_inbound_api_metrics_service_success_total
application_inbound_api_metrics_service_failure_total
failureoutboundservice_plus"""

rex = re.compile(r'^(?=.*outbound)(?=.*service_plus)(?=.*failure).*\n?', re.M)

filtered_lines = ''.join(rex.findall(lines))
print(filtered_lines)

Prints:

application_outbound_api_external_metrics_service_plus_failure_total
failureoutboundservice_plus

Upvotes: 1

Related Questions