Reputation: 643

C# Regex to match multiple section

I have a .txt file with this format

content-length: 20

blahblahblah
-stop-
content-length: 10

bum
-step-
content-length: 0

<---empty space--->
-step-
content-length: 10

huba
-step-

I use regex to separate the section per content length, which is use step or stop to make it become end of the section. My regex is

((content-length:)\s(\d)[\r\n]+([\s\S]+?)(-stop-|-step-))*

However, if the content length is zero which means before step or stop there is whitespace, it also capture the next content length section. Any idea to prevent this?

Upvotes: 0

Answers (5)

Tim007

Reputation: 2557

Try this

(?:(?:content-length):\s(?<length>\d+)\n+(?<content>.*?)\n*(?:-stop-|-step-))

Demo

Input:

content-length: 20

blahblahblah
-stop-
content-length: 10

bum
-step-
content-length: 0


-step-
content-length: 10

huba
-step-

Output:

MATCH 1
length  [16-18] `20`
content [20-32] `blahblahblah`
MATCH 2
length  [56-58] `10`
content [60-63] `bum`
MATCH 3
length  [87-88] `0`
2.  [91-91] ``
MATCH 4
length  [114-116]   `10`
content [118-122]   `huba`

Upvotes: 0

Quinn

Reputation: 4504

I come up with the following regex, not sure if it is what you want:

var pattern = @"(content-length:\s\d+(?:[\s\S]*?)?-(?:stop|step)-)";
var input = @"content-length: 20

    blahblahblah
    -stop-
    content-length: 10

    bum
    -step-
    content-length: 0


    -step-
    content-length: 10

    huba
    -step-";
var result = Regex.Split(input, pattern);

Output:

Upvotes: 1

Chekako

Reputation: 1

Try this code:

((content-length:)\s(\d)[\r\n]\*([\s\S]\*?)(-stop-|-step-))

Upvotes: 0

Akash

Reputation: 99

((content-length:)\s(\d+)[\r\n]+(.*)\n*(-stop-|-step-)). Check out the regex here https://regex101.com/r/wU9uA4/1

Upvotes: 0

Scott Weaver

Reputation: 7361

try this:

(?:(?:content-length:))\s(\d+)[\r\n]+(.*)?[\r\n]+(?:-stop-|-step-)

Upvotes: 0

C# Regex to match multiple section

Answers (5)

Related Questions