Reputation: 407
I hav 100s of pages of the following type transcript:
<p><strong>ROGELIO JIMÉNEZ PONS:</strong> Quisiera
<p>Text here...</p>
<p><strong>PRESIDENTE ANDRÉS MANUEL LÓPEZ OBRADOR:</strong>
<p>Text here...</p>
<p>Text here...</p>
<p><strong>PREGUNTA:</strong>
<p>Text here...</p>
<p><strong>PRESIDENTE ANDRÉS MANUEL LÓPEZ OBRADOR:</strong>
<p>Text here...</p>
<p>Text here...</p>
<p>Text here...</p>
<p><strong>INTERLOCUTOR:</strong>
I want to capture and return just what the Obrador says:
<p><strong>PRESIDENTE ANDRÉS MANUEL LÓPEZ OBRADOR:</strong>
<p>Text here...</p>
<p>Text here...</p>
<p><strong>PRESIDENTE ANDRÉS MANUEL LÓPEZ OBRADOR:</strong>
<p>Text here...</p>
<p>Text here...</p>
<p>Text here...</p>
I get close with this regex:
<p><strong>PRESIDENTE(.*)\n(.*)?\n?(.*)?\n?(.*)
But not quite right since I can't seem to work out the end of the pattern which should end with
<p><strong>[ANYTHING NOT PRESIDENTE]
Upvotes: 0
Views: 42
Reputation: 11515
OBRADOR:<\/strong>\r?\n((?:(?!<p><strong>)^[^\r\n]+\r?\n)+)
Upvotes: 1