Alex
Alex

Reputation: 1644

Match lines before empty new line

The input is like this:

0
00:00:00,000 --> 00:00:00,000
Hello world!

1
00:00:00,000 --> 00:00:00,000
Hello world!
This is my new world.

2
00:00:00,000 --> 00:00:00,000
Hello guys!

Using a clear and fast regex, I want to split that into:

Match 1: `0`
Match 2: `00:00:00,000 --> 00:00:00,000`
Match 3: `Hello world!`

Match 1: `1`
Match 2: `00:00:00,000 --> 00:00:00,000`
Match 3: `Hello world!
This is my new world.`

Match 1: `2`
Match 2: `00:00:00,000 --> 00:00:00,000`
Match 3: `Hello guys!`

I use (\d+)[\n\r]([\d:,]+\s-->\s[\d:,]+)[\n\r].+ for matching, but the problem is It does not match two line of text or more (match 3 in group 2 of the above example).

Note: If you know a way with good readability and better performance without using Regex, feel free to offer me that.

Thanks,
Alireza

Upvotes: 0

Views: 108

Answers (3)

pushpraj
pushpraj

Reputation: 13669

here you go

(\d+)[\n\r]([\d:,]+\s-->\s[\d:,]+)[\n](.+(?:[\n]*[^\d|^\n]+)*)

result

MATCH 1

  1. [0-1] 0

  2. [2-31] 00:00:00,000 --> 00:00:00,000

  3. [32-44] Hello world!

MATCH 2

  1. [46-47] 1

  2. [48-77] 00:00:00,000 --> 00:00:00,000

  3. [78-112] Hello world! This is my new world.

MATCH 3

  1. [114-115] 2

  2. [116-145] 00:00:00,000 --> 00:00:00,000

  3. [146-157] Hello guys!

try at regex101.com

EDIT

I did try to update the regex for numbers too, so now it match multiple lines, numbers within as needed. now it look bit short too

(\d+)[\n](.*?)\n((?s).*?)(?=\n\n\d|\Z)

this regex match the following

0
00:00:00,000 --> 00:00:00,000
Hello world!

1
00:00:00,000 --> 00:00:00,000
Hello world!
This is my new world.

2
00:00:00,000 --> 00:00:00,000
Hello guys!
This line contains 123457!
This is third line!
And more lines!

as

MATCH 1

  1. [0-1] 0

  2. [2-31] 00:00:00,000 --> 00:00:00,000

  3. [32-44] Hello world!

MATCH 2

  1. [46-47] 1

  2. [48-77] 00:00:00,000 --> 00:00:00,000

  3. [78-112] Hello world! This is my new world.

MATCH 3

  1. [114-115] 2

  2. [116-145] 00:00:00,000 --> 00:00:00,000

  3. [146-220] Hello guys! This line contains 123457! This is third line! And more lines!

try at regex101.com

Upvotes: 0

Avinash Raj
Avinash Raj

Reputation: 174696

You could use the below regex,

/(\d+)[\n\r]([\d:,]+\s-->\s[\d:,]+)(.*?)(?=\n\n|$)/sg

DEMO

Upvotes: 1

Jon Skeet
Jon Skeet

Reputation: 1499860

Well, here's a non-regex approach:

public IEnumerable<List<string>> ReadSeparatedLines(string file)
{
    List<string> lines = new List<string>();
    foreach (var line in File.ReadLines(file))
    {
        if (line == "")
        {
            // Only take action if we've actually got something to return. This
            // handles files starting with blank lines, and also files with
            // multiple consecutive blank lines.
            if (lines.Count > 0)
            {
                yield return lines;
                lines = new List<string>();
            }
        }
        else
        {
            lines.Add(line);
        }
    }
    // Check whether we had any trailing lines to return
    if (lines.Count > 0)
    {
        yield return lines;
    }
}

I would personally find that easier to understand than a regex, but you may have different tastes, of course.

Upvotes: 2

Related Questions