Reputation: 601
I'm trying to split a text into paragraphs each time a line contains a certain word. I already managed to split the text at the beginning of that word, but not at the beginning of the line containing that word. what's the right expression?
this is what I have
string[] paragraphs = Regex.Split(text, @"(?=INT.|EXT.)");
I also want to lose any empty paragraphs in the array.
this is the input
INT. LOCATION - DAY
Lorem ipsum dolor sit amet, consectetur adipiscing elit.
LOCATION - EXT.
Morbi cursus dictum tempor. Phasellus mattis at massa non porta.
LOCATION INT. - NIGHT
and I want to split it up keeping the same layout but just in paragraphs.
The result I have is
INT. LOCATION - DAY
Lorem ipsum dolor sit amet, consectetur adipiscing elit.
LOCATION -
EXT.
Morbi cursus dictum tempor. Phasellus mattis at massa non porta.
LOCATION
INT. - NIGHT
The new paragraphs start at the word and not at the line.
This is the desired result
Paragraph 1
INT. LOCATION - DAY
Lorem ipsum dolor sit amet, consectetur adipiscing elit.
Paragraph 2
LOCATION - EXT.
Morbi cursus dictum tempor. Phasellus mattis at massa non porta.
Paragraph 3
LOCATION INT. - NIGHT
The paragraph should always start at the beginning of the line containing the word INT. or EXT. not at the word.
Upvotes: 3
Views: 195
Reputation: 5414
Regex.Split(text, "(?=^.+?INT|^.+?EXT)", RegexOptions.Multiline);
check this text scenario
string text = "INT. LOCATION - DAY\n" +
"Lorem ipsum dolor sit amet, consectetur adipiscing elit.\n" +
"LOCATION - EXT.\n" +
"Morbi cursus dictum tempor. Phasellus mattis at massa non porta.\n" +
"LOCATION INT. - NIGHT\n";
string[] res = Regex.Split(text, "(?=^.+?INT|^.+?EXT)", RegexOptions.Multiline);
for (int i = 0; i < res.Count(); i++)
{
int lineNumber = i + 1;
Console.WriteLine("paragraph " + lineNumber + "\n" + res[i]);
}
#paragraph 1
#INT. LOCATION - DAY
#Lorem ipsum dolor sit amet, consectetur adipiscing elit.
#paragraph 2
#LOCATION - EXT.
#Morbi cursus dictum tempor. Phasellus mattis at massa non porta.
#paragraph 3
#LOCATION INT. - NIGHT
Upvotes: 2