Reputation: 3596
I have multi line string below (in python) and looking for regex to extract src, dst and severity. So in the example below group1 be '10.4.180.5' , group 2 '34.23.21.10' and group 3 'critical'
src: 10.4.180.25
dst: 34.23.21.10
natsrc: 20.160.129.5
natdst: 34.33.21.10
... more lines
severity: critical
... more lines
If I try regex like /src: (\b\d{1,3}.\d{1,3}.\d{1,3}.\d{1,3}\b)\ndst: (\b\d{1,3}.\d{1,3}.\d{1,3}.\d{1,3}\b)\n/ with gm flags it will find me src and dst but not severity which is few lines down (lines omitted for clarity). Is there a way to do it without including all of these lines between src, dst and severity ?
Upvotes: 4
Views: 2268
Reputation: 20414
You can use a greedy
lookup (think this is the right terminology) regex
to do this:
src: (\d{1,3}.\d{1,3}.\d{1,3}.\d{1,3})\ndst: (\d{1,3}.\d{1,3}.\d{1,3}.\d{1,3})[\s\S]*?severity: (.+)?\n
I have updated the regex
so it actually works now!
so it searches
for the same bit you have, but then as there are many lines
between the dst:
line and the severity
line, we need to skip all these lines.
To match any number of lines up to the line beginning with severity:
, we need to match any characters
- including new-lines. To do this, we can use a set of characters: [\s\S]
. This means match any character which is not a space or is a space, i.e. all characters. We then put this in a greedy
lookup to match as many any characters needed to get to the severity:
line - so this bit is [\s\S]*?severity:
.
Now we are at the severity:
line, we want to match
and return
the characters up to the end of that line (up to the new-line \n
character). This is done with the similar: (.+)?\n
syntax but with a plus
as we want to match one or more characters. Also, as want to return this bit, we need to put it in parentheses.
Upvotes: 2
Reputation: 626845
You missed need to actually match any number of lines that do not start with severity
after what your pattern matches. Besides, you may shorten the pattern by using {3}
limiting quantifier in order not to repeat \.\d{1,3}
so many times. Note than between a whitespace and a digit, the word boundary is implicit, it is already there, no need to use \b
.
Use
src:\s*(\d{1,3}(?:\.\d{1,3}){3})\ndst:\s*(\d{1,3}(?:\.\d{1,3}){3})(?:\n(?!severity).+)*?\nseverity:\s*(.+)
See the regex demo
Details
src:
- a literal substring\s*
- 0+ whitespaces(\d{1,3}(?:\.\d{1,3}){3})
- Group 1: IP-like pattern\n
- a newlinedst:\s*
- dst:
with 0+ whitespaces after it(\d{1,3}(?:\.\d{1,3}){3})
- Group 1: IP-like pattern(?:\n(?!severity).+)*?
- 0+ sequences (as few as possible) of
\n(?!severity)
- a newline not followed with severity
.+
- the whole line\nseverity:\s*
- a newline, severity:
substring and 0+ whitespaces(.+)
- Group 3: 1 or more chars up to the end of the lineNote you do not need any DOTALL modifier with this regex.
Upvotes: 3