Ephreal
Ephreal

Reputation: 2067

How to format a regex

I am trying to make a Warning waver which can look for known warnings in a log file.

The warnings in the waving file are copied directly from the log file during a review of the warnings.

The mission here is to make it as simple as possible. But i found that directly copying was a bit problematic due to that fact that the warnings could contain absolute paths.

So I added a "tag" which could be inserted into a warning which the system should look for. The whole string would then look like this.

WARNING:HDLParsers:817 - ":RE[.*]:/modules/top/hdl_src/top.vhd" Line :RE[.*]: Choice . is not a locally static expression.

The tag is :RE[Insert RegEx here]:. In the above warning string there are two of these tags which I am trying to find using Python3 regex tool. And my pattern is the following:

(:RE\[.*\]\:)

See RegEx101 for reference

My problem with the above is that, when there are two tags in my string it finds only one result extended from the first to the last tag. how do i setup the regex so it will find each tag ?

Regards

Upvotes: 1

Views: 66

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626738

You can use re.findall with the following regex that assumes that the regular expression inside the square brackets spans from :RE[ up to the ] that is followed by ]:

:RE\[.*?]:

See regex demo. The .*? matches 0 or more characters other than a newline but as few as possible. See rexegg.com description of a lazy quantifier solution:

The lazy .*? guarantees that the quantified dot only matches as many characters as needed for the rest of the pattern to succeed.

See IDEONE demo

import re
p = re.compile(r':RE\[.*?]:')
test_str = "# Even more commments\nWARNING:HDLParsers:817 - \":RE[.*]:/modules/top/hdl_src/cpu_0342.vhd\" Line :RE[.*]: Choice . is not a locally static expression."
print(p.findall(test_str))

If you need to get the contents between the [ and ], use a capturing group so that re.findall could extract just those contents:

p = re.compile(r':RE\[(.*?)]:')

See another demo

To obtain indices, use re.finditer (see this demo):

re.finditer(pattern, string, flags=0)
Return an iterator yielding match objects over all non-overlapping matches for the RE pattern in string. The string is scanned left-to-right, and matches are returned in the order found. Empty matches are included in the result unless they touch the beginning of another match.

p = re.compile(r':RE\[(.*?)]:')
print([x.start(1) for x in p.finditer(test_str)])

Upvotes: 1

Related Questions