Reputation: 1679
i wish to match a multiple field value delimited by a colon in a single line, but each field and value text contains space e.g.
field1 : value1a value1b
answer
match1: Group1=field1, Group2=value1a value1b
or
field1 : value1a value1b field2 : value2a value2b
answer
match1: Group1=field1, Group2=value1a value1b
match2: Group1=field2, Group2=value2a value2b
the best i can do right now is (\w+)\s*:\s*(\w+)
Regex regex = new Regex(@"(\w+)\s*:\s*(\w+)");
Match m = regex.Match("field1 : value1a value1b field2 : value2a value2b");
while (m.Success)
{
string f = m.Groups[1].Value.Trim();
string v = m.Group2[2].Value.Trim();
}
i guess look ahead may help, but i don't know how to make it thank you
Upvotes: 0
Views: 624
Reputation: 627103
You can use a regex based on a lazy dot:
var matches = Regex.Matches(text, @"(\w+)\s*:\s*(.*?)(?=\s*\w+\s*:|$)");
See the C# demo online and the .NET regex demo (please mind that regex101.com does not support .NET regex flavor).
As you see, no need using a tempered greedy token. The regex means:
(\w+)
- Group 1: any one or more letters/digits/underscore\s*:\s*
- a colon enclosed with zero or more whitespace chars(.*?)
- Group 2: any zero or more chars other than a newline, as few as possible(?=\s*\w+\s*:|$)
- up to the first occurrence of one or more word chars enclosed with zero or more whitesapces or end of string.Full C# demo:
using System;
using System.Text.RegularExpressions;
public class Test
{
public static void Main()
{
var text = "field1 : value1a value1b field2 : value2a value2b";
var matches = Regex.Matches(text, @"(\w+)\s*:\s*(.*?)(?=\s*\w+\s*:|$)");
foreach (Match m in matches)
{
Console.WriteLine("-- MATCH FOUND --\nKey: {0}, Value: {1}",
m.Groups[1].Value, m.Groups[2].Value);
}
}
}
Output:
-- MATCH FOUND --
Key: field1, Value: value1a value1b
-- MATCH FOUND --
Key: field2, Value: value2a value2b
Upvotes: 0
Reputation: 20834
You may try
(\w+)\s*:\s*((?:(?!\s*\w+\s*:).)*)
(\w+)
group 1, any consecutive words\s*:\s*
a colon with any space around(...)
group 2(?:...)*
a non capture group, repeats any times(?!\s*\w+\s*:).
negative lookahead with a character ahead, the following character must not form a word surrounds by any space followed by a colon. Thus the group 2 never consumes any words before a colonSee the test cases
Upvotes: 3