Carl Verret
Carl Verret

Reputation: 616

c# REGEX parse content with new line included in group

I'm trying to evaluate, using c#, the content of a string following this pattern (number and description are separated by a tab - header is not part of the text, only here for sake of explanation ):

#   description
1   first item
2   second item on two or
    more lines of text
3   third item

and I would like to get a list of match where there's a group for the number and and a group for the description. I've almost achieved it with the following regex :

(?'number'\d+)(?:\t)(?'description'.+)

which gave me 3 matches but the text of the second match on the new line is always discarded. Can't find how to include the text on multiples lines within the description group.

Upvotes: 1

Views: 567

Answers (2)

Matt.G
Matt.G

Reputation: 3609

Try Regex: (?'number'\d+)\t(?'description'.+?)(?=^\d|\Z)

Demo

Upvotes: 0

The fourth bird
The fourth bird

Reputation: 163217

You could use a negative lookahead to assert that what follows the .* is not a newline and 1+ digits followed by a tab.

Repeat 0+ times matching the whole line to keep it in the description group.

(?'number'\d+)\t(?'description'.+(?:\n(?!\d+\t).*)*)

Explanation

  • (?'number'\d+) Match 1+ digits in group number
  • \t Match a tab
  • (?'description' Named capturing group description
    • .+ Match any char except a newline
    • (?: Non capturing group
      • \n(?!\d+\t).* Match newline and assert what follows is not 1+ digits and a tab
    • )* Close group and repeat 0+ times
  • ) Close group description

See a .NET regex demo

Upvotes: 1

Related Questions