SMJune
SMJune

Reputation: 407

multiline regex pattern for text patterns

I hav 100s of pages of the following type transcript:

<p><strong>ROGELIO JIMÉNEZ PONS:</strong> Quisiera
<p>Text here...</p>
<p><strong>PRESIDENTE ANDRÉS MANUEL LÓPEZ OBRADOR:</strong>
<p>Text here...</p>
<p>Text here...</p>
<p><strong>PREGUNTA:</strong>
<p>Text here...</p>
<p><strong>PRESIDENTE ANDRÉS MANUEL LÓPEZ OBRADOR:</strong>
<p>Text here...</p>
<p>Text here...</p>
<p>Text here...</p>
<p><strong>INTERLOCUTOR:</strong>

I want to capture and return just what the Obrador says:

<p><strong>PRESIDENTE ANDRÉS MANUEL LÓPEZ OBRADOR:</strong>
<p>Text here...</p>
<p>Text here...</p>

<p><strong>PRESIDENTE ANDRÉS MANUEL LÓPEZ OBRADOR:</strong>
<p>Text here...</p>
<p>Text here...</p>
<p>Text here...</p>

I get close with this regex:

<p><strong>PRESIDENTE(.*)\n(.*)?\n?(.*)?\n?(.*)

But not quite right since I can't seem to work out the end of the pattern which should end with

<p><strong>[ANYTHING NOT PRESIDENTE]

Upvotes: 0

Views: 42

Answers (1)

OBRADOR:<\/strong>\r?\n((?:(?!<p><strong>)^[^\r\n]+\r?\n)+)

Upvotes: 1

Related Questions