Reputation: 6996
I have the following input:
-key1:"val1" -key2: "val2" -key3:(val3) -key4: "(val4)" -key5: val5 -key6: "val-6" -key-7: val7 -key-eight: "val 8"
With only the following assumption about the pattern:
-
followed by a value delimited by :
How can I match and extract each key and it's corresponding value?
I have so far come up with the following regex:
-(?<key>\S*):\s?(?<val>\S*)
But it's currently not matching the complete value for the last argument as it contains a space but I cannot figure out how to match it.
The expected output should be:
Any help is much appreciated.
Upvotes: 2
Views: 84
Reputation: 653
This should do the trick
-(?<key>\S*):\s*(?<value>(?(?=")((")(?:(?=(\\?))\2.)*?\1))(\S*))
a sample link can be found here.
Basically it does and if/else/then to detect if the value contain "
as (?(?=")(true regex)(false regex)
, the false regex is yours \S*
while the true regex will try to match start/end quote (")(?:(?=(\\?))\2.)*?\1)
.
Upvotes: 0
Reputation: 357
I presume you're wanting to keep the brackets and quotation marks as that's what you're doing in the example you gave? If so then the following should work:
-(?<key>\S+):+\s?(?<val>\S+\s?\d+\)?\"?)
This does presume that all val's end with a number though.
EDIT: Given that the val doesn't always end with a number, but I'm guessing it always starts with val, this is what I have:
-(?<key>\S+):+\s?(?<val>\"?\(?(val)+\s?\S+)
Seems to be working properly...
Upvotes: 1
Reputation: 11233
Try this regex using Replace function:
(?:^|(?!\S)\s*)-|\s*:\s*
and replace with "\n". You should get key values in separate lines.
Upvotes: 1
Reputation: 10563
Guessing that you want to only allow whitespace characters that are not at the beginning or end, change your regex to:
-(?<key>\S*):\s?(?<val>\S+(\s*[^-\s])*)
This assumes that the character -
preceeded by a whitespace unquestioningly means a new key is beginning, it cannot be a part of any value.
For this example:
-key: value -key2: value with whitespace -key3: value-with-hyphens -key4: v
The matches are:
-key: value
, -key2: value with whitespace
, -key3: value-with-hyphens
, -key4: v
.
It also works perfectly well on your provided example.
Upvotes: 4
Reputation: 81493
A low tech (non regex) solution, just for an alternative. Trim guff, ToDictionary
if you need
var results = input.Split(new[] { " -" }, StringSplitOptions.RemoveEmptyEntries)
.Select(x => x.Trim('-').Split(':'));
Output
key1 -> "val1"
key2 -> "val2"
key3 -> (val3)
key4 -> "(val4)"
key5 -> val5
key6 -> "val-6"
key-7 -> val7
key8 -> "val 8"
Upvotes: 1