Reputation: 11
I just can't get this to work.
Scenario: Subtitling, SRT format. If the first out of two lines contains an opening italics tag <i>
and the italicized part of the text extends into the second line, then the first line needs a closing tag </i>
at its end and the second line an opening tag <i>
at its beginning.
Approach: If <i>
is found in line1, then look if there is a closing tag in that line. If yes, do nothing, if not, then replace line1 minus its line break with: line1</i>\n<i>
.
This is what I've tried:
Find: (.*<i>.*(?!.*</i>.*\n))\n
Replace with: $1<i/>\n<i>
Problem: Although there is an instance of a closing tag after an opening tag in line1, this gives out a match.
Line1 and line2 refer to the text lines in the blocks below, so ignore the lines with the numbers and the time code.
Example material:
1
00:00:01,000 --> 00:00:03,320
<i>Alle meine Entchen
schwimmen auf dem See</i>
2
00:00:04,240 --> 00:00:06,880
<i>Köpfchen</i> in das Wasser
Schwänzchen in die <i>Höh</i>.
3
00:00:06,960 --> 00:00:08,960
<i>(Musik endet ♪,</i>
<i>Männerstimme, Englisch:)</i>
1: should get a closing tag at the end of line1 and an opening tag at the start of line2
2 and 3: Should not be considered a match and be left alone
Any help will be greatly appreciated. Best,
Ingo
Upvotes: 1
Views: 125
Reputation: 11
Thank you everyone for your fantastic input. It helped me to construct the following solution, which also works for the case of a second instance of an opening tag, like this
<i>Köpfchen</i> in <i>das Wasser
Schwänzchen in die Höh</i>.
=>
<i>Köpfchen</i> in <i>das Wasser</i>
<i>Schwänzchen in die Höh</i>.
and it doesn't introduce any new line breaks.
step1
(?m)(?<=<i>(?!.*</i>).*$?)\r => </i>
step2
(?m)^(?=.*(?<!<i>.*)</i>.*\r?$) => <i>
Upvotes: 0
Reputation: 1678
You were close, with the negative lookahead. Here's how you might identify a line with an opening <i>
that's not followed by its corresponding closing </i>
using JS:
// this should not modify the string, as it
// contains the closing </i> element
console.log(
"this <i>is a</i> test".replace(/(?!<i>.+<\/i>)(<i>.+$)/g, '$1</i>')
);
// this one should modify the string, appending
// the closing </i> to the end
console.log(
"this <i>is a test".replace(/(?!<i>.+<\/i>)(<i>.+$)/g, '$1</i>')
);
And, here's a demonstration, in Python, as requested:
>>> import re;
>>> print(re.sub(r'(?!<i>.+<\/i>)(<i>.+$)', r'\1</i>', "this <i>is a</i> test"))
this <i>is a</i> test
>>> print(re.sub(r'(?!<i>.+<\/i>)(<i>.+$)', r'\1</i>', "this <i>is a test"))
this <i>is a test</i>
Upvotes: 0