Reputation: 75
Before I start, I know there are already many threads created on this topic, but I couldn't find anything specific to my use case, therefore I would like to ask this question with a pratical example based on what I found.
I have a multiline string that contain content I'm interested in it:
hello: "
-foo=value1
-bar=value2
-baz=\"value3\"
"
I would like to match group on specific content within the multiline string, so that it matches on whatever within this string starts with -
. The conditions are:
hello:
-
within the string.\"
hello: "-foo=value1"
I came up with this pattern:
hello:\s*"\s*-((?:[^"]|\n)*)"
Unfortunately, it doesn't consider escaped strings and also doesn't match each result. I need the matched groups to repeat on each line within that string.
Upvotes: 1
Views: 103
Reputation: 978
I tried a recursive approach to this problem. Here is my solution.
I'm taking this as my input text block which I guess have captured all possibilities:
source_text = """
hello: "
-foo=value1
-bar=value2
-baz=\"value3\"
"
bla: "
-foo=value1
-bar=value2
-baz=\"value3\"
"
hello: "
-foo=value1
-bar=value2
-baz=\"value3\"
"
hello: -baz=\"value3\"
hello: "-foo=value1"
"""
Recursively match the regex hello:\s("|\\")?\n?(\s*-\w+=\\?"?\w+\\?"?\n?)+
which captures the required text in match group 2
import re
all_matches = []
def get_all_matches(text):
for block_match in re.finditer(r'hello:\s("|\\")?\n?(\s*-\w+=\\?"?\w+\\?"?\n?)+', text):
match = block_match.group(2)
if match:
all_matches.append(re.sub("\n|\\s", '', match))
full_match = block_match.group(0)
get_all_matches(full_match.replace(match, ''))
get_all_matches(source_text)
Output
print(all_matches)
['-baz="value3"', '-bar=value2', '-foo=value1', '-baz="value3"', '-bar=value2', '-foo=value1', '-baz="value3"', '-foo=value1"']
Upvotes: 1
Reputation: 626689
You can use
hello:\s*"(?:\s*-((?:\\.|[^\n"\\])+))+\s*"
See this regex demo at the .NET regex tester. Details:
hello:\s*"
- hello:
, zero or more whitespaces, "
(?:\s*-((?:\\.|[^\n"\\])+))+
- one or more occurrences of:
\s*-
zero or more occurrences of whitespace chars and then a -
((?:\\.|[^\n"\\])+)
- Group 1: one or more occurrences of any escaped char or any char but \
, "
and a newline\s*"
- zero or more whitespace and then "
.See this C# demo:
var s = "hello: \"\n -foo=value1\n -bar=value2\n -baz=\\\"value3\\\"\n\"";
var rx = @"hello:\s*""(?:\s*-((?:\\.|[^\n""\\])+))+\s*""";
var result = Regex.Match(s, rx)?.Groups[1].Captures.Cast<Capture>().Select(x => x.Value).ToList();
foreach (var t in result)
Console.WriteLine(t);
See this Python demo:
import regex
s = 'hello: "\n -foo=value1\n -bar=value2\n -baz=\\"value3\\"\n"';
rx = r'hello:\s*"(?:\s*-((?:\\.|[^\n"\\])+))+\s*"';
match = regex.search(rx, s)
if match:
print(match.captures(1))
Output:
foo=value1
bar=value2
baz=\"value3\"
This approach won't work in Go. You need to extract the whole match with hello:\s*"([^"\\]*(?:\\[\s\S][^"\\]*)*)"
and then split the Group 1 value with a line break sequence.
Upvotes: 1