user3863169
user3863169

Reputation: 3

could .NET regular expression include '\n'?

i have xml-like format file.

and i need 3 information from each component.(time, classname, content)

<Sync Start=25199><P Class=ENCC>
foo
<Sync Start=26522><P Class=ENCC>
bar
<Sync Start=27863><P Class=ENCC>
stack
<Sync Start=30087><P Class=ENCC>
overflow

in this case, the result should be 4 set of information including {25199,ENCC,foo}

Regex exp = new Regex(@"<Sync Start=(.*?)><P Class=(.*?)>(.*?)", RegexOptions.IgnoreCase);
MatchCollection MatchList = exp.Matches(text);
foreach (Match FirstMatch in MatchList){
    GroupCollection groups = FirstMatch.Groups;
    foreach(Group g in groups){
        Console.WriteLine(g.Value);
    }
}

this prints time,classname. except content.

Please share your experiences and knowledge.

Upvotes: 0

Views: 88

Answers (3)

Avinash Raj
Avinash Raj

Reputation: 174696

The below regex would capture all the above mentioned three values from the four lines,

/<Sync Start=(.*?)><P Class=(.*?)>\n(\w+)/gm

DEMO

C# code would be,

String input = @"<Sync Start=25199><P Class=ENCC>
foo
<Sync Start=26522><P Class=ENCC>
bar
<Sync Start=27863><P Class=ENCC>
stack
<Sync Start=30087><P Class=ENCC>
overflow";
Regex rgx = new Regex(@"(?m)<Sync Start=(.*?)><P Class=(.*?)>\n(\w+)");
foreach (Match m in rgx.Matches(input))
{
Console.WriteLine(m.Groups[1].Value);
Console.WriteLine(m.Groups[2].Value);
Console.WriteLine(m.Groups[3].Value);
 }

IDEONE

Explanation:

  • <Sync Start=(.*?)> Captures all the characters which are just after to <Sync Start= upto the next > symbol.
  • <P Class=(.*?) Captures all the characters which are just after to the string <P class= upto the next > symbol.
  • (?m) Multiline modifier.
  • \n(\w+) word characters after the new line symbol are captured into group3.

Upvotes: 1

zx81
zx81

Reputation: 41838

Use this pattern:

(?m)^<Sync Start=([^>]+)><P Class=([^>]+)>\s*^([^<]\S+)

In the regex demo, see the Group captures in the right pane.

Sample Code

We need to retrieve the matches from Groups 1, 2 and 3.

var myRegex = new Regex(@"(?m)^<Sync Start=([^>]+)><P Class=([^>]+)>\s*^([^<]\S+)");
Match matchResult = myRegex.Match(yourString);
while (matchResult.Success) {
    Console.WriteLine(matchResult.Groups[1].Value,
                      matchResult.Groups[2].Value,
                      matchResult.Groups[3].Value);
    // Add them to whatever data structure you like
    matchResult = matchResult.NextMatch();
}

Explanation

  • (?m) turns on multi-line mode, allowing ^ and $ to match on each line
  • The ^ anchor asserts that we are at the beginning of the line
  • <Sync Start= matches literal chars
  • ([^>]+) matches any char that is not >
  • ><P Class= matches literal chars
  • ([^>]+) matches any char that is not >
  • > matches literal char
  • \s* matches any white-space, including line breaks
  • The ^ anchor asserts that we are at the beginning of the line
  • ([^<]\S+) matches a char that is not <, then any non-whitespace char

Upvotes: 1

Robert Koritnik
Robert Koritnik

Reputation: 105019

The direct answer to your question

Yes it can contain a newline character as long as you set RegexOptions.Multiline when constructing your regular expression instance.

Upvotes: 0

Related Questions