Reputation: 3
i have xml-like format file.
and i need 3 information from each component.(time, classname, content)
<Sync Start=25199><P Class=ENCC>
foo
<Sync Start=26522><P Class=ENCC>
bar
<Sync Start=27863><P Class=ENCC>
stack
<Sync Start=30087><P Class=ENCC>
overflow
in this case, the result should be 4 set of information including {25199,ENCC,foo}
Regex exp = new Regex(@"<Sync Start=(.*?)><P Class=(.*?)>(.*?)", RegexOptions.IgnoreCase);
MatchCollection MatchList = exp.Matches(text);
foreach (Match FirstMatch in MatchList){
GroupCollection groups = FirstMatch.Groups;
foreach(Group g in groups){
Console.WriteLine(g.Value);
}
}
this prints time,classname. except content.
Please share your experiences and knowledge.
Upvotes: 0
Views: 88
Reputation: 174696
The below regex would capture all the above mentioned three values from the four lines,
/<Sync Start=(.*?)><P Class=(.*?)>\n(\w+)/gm
C# code would be,
String input = @"<Sync Start=25199><P Class=ENCC>
foo
<Sync Start=26522><P Class=ENCC>
bar
<Sync Start=27863><P Class=ENCC>
stack
<Sync Start=30087><P Class=ENCC>
overflow";
Regex rgx = new Regex(@"(?m)<Sync Start=(.*?)><P Class=(.*?)>\n(\w+)");
foreach (Match m in rgx.Matches(input))
{
Console.WriteLine(m.Groups[1].Value);
Console.WriteLine(m.Groups[2].Value);
Console.WriteLine(m.Groups[3].Value);
}
Explanation:
<Sync Start=(.*?)>
Captures all the characters which are just after to <Sync Start=
upto the next >
symbol.<P Class=(.*?)
Captures all the characters which are just after to the string <P class=
upto the next >
symbol.(?m)
Multiline modifier.\n(\w+)
word characters after the new line symbol are captured into group3.Upvotes: 1
Reputation: 41838
Use this pattern:
(?m)^<Sync Start=([^>]+)><P Class=([^>]+)>\s*^([^<]\S+)
In the regex demo, see the Group captures in the right pane.
Sample Code
We need to retrieve the matches from Groups 1, 2 and 3.
var myRegex = new Regex(@"(?m)^<Sync Start=([^>]+)><P Class=([^>]+)>\s*^([^<]\S+)");
Match matchResult = myRegex.Match(yourString);
while (matchResult.Success) {
Console.WriteLine(matchResult.Groups[1].Value,
matchResult.Groups[2].Value,
matchResult.Groups[3].Value);
// Add them to whatever data structure you like
matchResult = matchResult.NextMatch();
}
Explanation
(?m)
turns on multi-line mode, allowing ^
and $
to match on each line^
anchor asserts that we are at the beginning of the line<Sync Start=
matches literal chars([^>]+)
matches any char that is not >
><P Class=
matches literal chars([^>]+)
matches any char that is not >
>
matches literal char\s*
matches any white-space, including line breaks^
anchor asserts that we are at the beginning of the line([^<]\S+)
matches a char that is not <
, then any non-whitespace charUpvotes: 1
Reputation: 105019
Yes it can contain a newline character as long as you set RegexOptions.Multiline
when constructing your regular expression instance.
Upvotes: 0