Reputation: 1644
The input is like this:
0 00:00:00,000 --> 00:00:00,000 Hello world! 1 00:00:00,000 --> 00:00:00,000 Hello world! This is my new world. 2 00:00:00,000 --> 00:00:00,000 Hello guys!
Using a clear and fast regex, I want to split that into:
Match 1: `0` Match 2: `00:00:00,000 --> 00:00:00,000` Match 3: `Hello world!` Match 1: `1` Match 2: `00:00:00,000 --> 00:00:00,000` Match 3: `Hello world! This is my new world.` Match 1: `2` Match 2: `00:00:00,000 --> 00:00:00,000` Match 3: `Hello guys!`
I use (\d+)[\n\r]([\d:,]+\s-->\s[\d:,]+)[\n\r].+
for matching, but the problem is It does not match two line of text or more (match 3 in group 2 of the above example).
Note: If you know a way with good readability and better performance without using Regex, feel free to offer me that.
Thanks,
Alireza
Upvotes: 0
Views: 108
Reputation: 13669
here you go
(\d+)[\n\r]([\d:,]+\s-->\s[\d:,]+)[\n](.+(?:[\n]*[^\d|^\n]+)*)
result
MATCH 1
[0-1] 0
[2-31] 00:00:00,000 --> 00:00:00,000
[32-44] Hello world!
MATCH 2
[46-47] 1
[48-77] 00:00:00,000 --> 00:00:00,000
[78-112] Hello world!
This is my new world.
MATCH 3
[114-115] 2
[116-145] 00:00:00,000 --> 00:00:00,000
[146-157] Hello guys!
try at regex101.com
EDIT
I did try to update the regex for numbers too, so now it match multiple lines, numbers within as needed. now it look bit short too
(\d+)[\n](.*?)\n((?s).*?)(?=\n\n\d|\Z)
this regex match the following
0
00:00:00,000 --> 00:00:00,000
Hello world!
1
00:00:00,000 --> 00:00:00,000
Hello world!
This is my new world.
2
00:00:00,000 --> 00:00:00,000
Hello guys!
This line contains 123457!
This is third line!
And more lines!
as
MATCH 1
[0-1] 0
[2-31] 00:00:00,000 --> 00:00:00,000
[32-44] Hello world!
MATCH 2
[46-47] 1
[48-77] 00:00:00,000 --> 00:00:00,000
[78-112] Hello world!
This is my new world.
MATCH 3
[114-115] 2
[116-145] 00:00:00,000 --> 00:00:00,000
[146-220] Hello guys!
This line contains 123457!
This is third line!
And more lines!
try at regex101.com
Upvotes: 0
Reputation: 174696
You could use the below regex,
/(\d+)[\n\r]([\d:,]+\s-->\s[\d:,]+)(.*?)(?=\n\n|$)/sg
Upvotes: 1
Reputation: 1499860
Well, here's a non-regex approach:
public IEnumerable<List<string>> ReadSeparatedLines(string file)
{
List<string> lines = new List<string>();
foreach (var line in File.ReadLines(file))
{
if (line == "")
{
// Only take action if we've actually got something to return. This
// handles files starting with blank lines, and also files with
// multiple consecutive blank lines.
if (lines.Count > 0)
{
yield return lines;
lines = new List<string>();
}
}
else
{
lines.Add(line);
}
}
// Check whether we had any trailing lines to return
if (lines.Count > 0)
{
yield return lines;
}
}
I would personally find that easier to understand than a regex, but you may have different tastes, of course.
Upvotes: 2