DigitalLowlife
DigitalLowlife

Reputation: 11

RegEx: Check if italics closing tag exists in first line

I just can't get this to work. Scenario: Subtitling, SRT format. If the first out of two lines contains an opening italics tag <i> and the italicized part of the text extends into the second line, then the first line needs a closing tag </i> at its end and the second line an opening tag <i> at its beginning.

Approach: If <i> is found in line1, then look if there is a closing tag in that line. If yes, do nothing, if not, then replace line1 minus its line break with: line1</i>\n<i>.

This is what I've tried:

Find: (.*<i>.*(?!.*</i>.*\n))\n
Replace with: $1<i/>\n<i>

Problem: Although there is an instance of a closing tag after an opening tag in line1, this gives out a match.

Line1 and line2 refer to the text lines in the blocks below, so ignore the lines with the numbers and the time code.

Example material:

1
00:00:01,000 --> 00:00:03,320
<i>Alle meine Entchen
schwimmen auf dem See</i>

2
00:00:04,240 --> 00:00:06,880
<i>Köpfchen</i> in das Wasser
Schwänzchen in die <i>Höh</i>.

3
00:00:06,960 --> 00:00:08,960
<i>(Musik endet ♪,</i>
<i>Männerstimme, Englisch:)</i>

1: should get a closing tag at the end of line1 and an opening tag at the start of line2

2 and 3: Should not be considered a match and be left alone

Any help will be greatly appreciated. Best,

Ingo

Upvotes: 1

Views: 125

Answers (2)

DigitalLowlife
DigitalLowlife

Reputation: 11

Thank you everyone for your fantastic input. It helped me to construct the following solution, which also works for the case of a second instance of an opening tag, like this

<i>Köpfchen</i> in <i>das Wasser
Schwänzchen in die Höh</i>.

=>

<i>Köpfchen</i> in <i>das Wasser</i>
<i>Schwänzchen in die Höh</i>.

and it doesn't introduce any new line breaks.

step1

(?m)(?<=<i>(?!.*</i>).*$?)\r   => </i>

step2

(?m)^(?=.*(?<!<i>.*)</i>.*\r?$) => <i>

Upvotes: 0

dossy
dossy

Reputation: 1678

You were close, with the negative lookahead. Here's how you might identify a line with an opening <i> that's not followed by its corresponding closing </i> using JS:

// this should not modify the string, as it
// contains the closing </i> element
console.log(
  "this <i>is a</i> test".replace(/(?!<i>.+<\/i>)(<i>.+$)/g, '$1</i>')
);

// this one should modify the string, appending
// the closing </i> to the end
console.log(
  "this <i>is a test".replace(/(?!<i>.+<\/i>)(<i>.+$)/g, '$1</i>')
);

And, here's a demonstration, in Python, as requested:

>>> import re;

>>> print(re.sub(r'(?!<i>.+<\/i>)(<i>.+$)', r'\1</i>', "this <i>is a</i> test"))
this <i>is a</i> test

>>> print(re.sub(r'(?!<i>.+<\/i>)(<i>.+$)', r'\1</i>', "this <i>is a test"))
this <i>is a test</i>

Upvotes: 0

Related Questions