Reputation: 1183
I have the following string:
[The quick] brown fox [mykey*="is a super-fast9"] animal [mykey^="that"] can run "very rapid" and [otherkey="effortlessly"].
I need to extract the words(separated by space) within double quotes that is at the same time within brackets that start with a specific keyword(mykey).
So far I have:
The quick
mykey*="is
a
super-fast9"
mykey^="that"
otherkey="effortlessly"
But I want:
is
a
super-fast9
that
Example Link: https://regex101.com/r/zmNse1/2
Upvotes: 0
Views: 329
Reputation: 6258
For what it's worth: Since others mentioned String Parsing, I thought I'd give one implementation of that here. String parsing options are always longer-winded, but are orders of magnitude faster than Regular Expressions. As a guy who uses Regex a LOT, I can still say that I prefer string functions where possible. The only complications with this answer are that you have to know what your assignment operators are, and you can't have Escaped Double-Quotes in your String Value. I wrote it fairly verbose, though you could cut out some conditionals or shorten some lines if you wanted less bytes of code.
List<string> GetValuesByKeyword(string keyword, string input)
{
var vals = new List<string>();
int startIndex = input.IndexOf("[");
while (startIndex >= 0)
{
var newValue = "";
if (startIndex >= 0 && startIndex < input.Length - 1)
{
var squareKey = input.Substring(startIndex + 1).Trim();
if (squareKey.StartsWith(keyword))
{
var squareAssign = squareKey.Substring(keyword.Length).Trim();
var assignOp = StartsWithWhich(squareAssign, "=", "+=", "-=", "*=", "/=", "^=", "%=");
if (!string.IsNullOrWhiteSpace(assignOp))
{
var quotedVal = squareAssign.Substring(assignOp.Length).Trim();
if (quotedVal.StartsWith("\""))
{
var endQuoteIndex = quotedVal.IndexOf('"', 1);
if (endQuoteIndex > 0)
{
newValue = quotedVal.Substring(1, endQuoteIndex - 1);
}
}
}
}
}
if (!string.IsNullOrWhiteSpace(newValue))
{
vals.Add(newValue);
startIndex = input.IndexOf("[", input.IndexOf(newValue, startIndex) + newValue.Length);
}
else startIndex = input.IndexOf("[", startIndex + 1);
}
return string.Join(" ", vals).Split(new char[] { ' ' }, StringSplitOptions.RemoveEmptyEntries).ToList();
}
Upvotes: 0
Reputation: 7948
The solution offered by Wiktor is the most logical to use, but for sake of RegEx challenge see this Pattern \[(?!mykey)[^\[]+|([^\s\[=\"]+)(?=[^\"]*\"\])
, check group #1 Demo
\[ # "["
(?! # Negative Look-Ahead
mykey # "mykey"
) # End of Negative Look-Ahead
[^\[] # Character not in [\[] Character Class
+ # (one or more)(greedy)
| # OR
( # Capturing Group (1)
[^\s\[=\"] # Character not in [\s\[=\"] Character Class
+ # (one or more)(greedy)
) # End of Capturing Group (1)
(?= # Look-Ahead
[^\"] # Character not in [\"] Character Class
* # (zero or more)(greedy)
\" # """
\] # "]"
) # End of Look-Ahead
Upvotes: 2
Reputation: 1211
This regex should do what you want :
(?<=\[mykey.?="[^]]*)[\w-]+(?=[^]]*"\])
Demo here
I assumed there cannot be nested brackets. Also I didn't know what to do with the ^
or *
between mykey
and the =
, so I allowed an optional wildcard.
You might need to escape the backslashes in your code.
Upvotes: 1
Reputation: 626950
You may match the substrings you need with a relatively simple regex and capture the parts between quotes, and then split the captures with 1 or more whitespace pattern:
var pattern = "\\[mykey[^][=]+=\"([^\"]*)\"]";
var s = "[The quick] brown fox [mykey*=\"is a super-fast9\"] animal [mykey^=\"that\"] can run \"very rapid\".";
var result = Regex.Matches(s, pattern)
.Cast<Match>()
.SelectMany(v => v.Groups[1].Value.Trim().Split(new[] {" "}, StringSplitOptions.RemoveEmptyEntries))
.ToList();
Console.WriteLine(string.Join("\n", result));
See the C# demo.
The pattern is
\[mykey[^][=]+="([^"]*)"]
See the regex demo.
Pattern details
\[
- a literal [
mykey
- a literal substring[^][=]+
- 1 or more chars other than [
, ]
and =
=
- an equal sign"
- a double quote([^"]*)
- Group 1: any 0+ chars other than "
"]
- a literal "]
substring.Note that the captured value is trimmed from leading/trailing whitespace first (with .Trim()
) to avoid empty values in the result. @"\s+"
matches 1 or more whitespace chars. The .Split(new[] {" "}, StringSplitOptions.RemoveEmptyEntries)
splits Group 1 value with whitespaces.
Upvotes: 1