Melvin Richard
Melvin Richard

Reputation: 433

Regular expression to find the first occurrence of string from a point of another string

My text is

 < <footnotes><footnote><info><![CDATA[Some text ‘ ”https://www.google.com”> AAAA OR    ” https://www.google.com”> AAAA OR    ” https://www.google.com”> AAAA OR   ” https://www.google.com”> AAAA’]]></info></footnote></footnotes><resources></resources>

I need to find the text from the first "https" till "]]", and I was able to do it like:

(?=https).*?(?=\]\])

But what if I have to find the "info" text from there find the first "https" till "]]"?

And is there a way to remove any character between the text? If suppose I am getting the text between "https" to "]]" and I have to remove all the "OR" from my result string?

So my final result from regex will look like

https://www.google.com”> AAAA     ” https://www.google.com”> AAAA     ” https://www.google.com”> AAAA    ” https://www.google.com”> AAAA’

How to do it with the single regex?

Upvotes: 0

Views: 84

Answers (1)

Tim Biegeleisen
Tim Biegeleisen

Reputation: 522626

In general, when parsing nested content like XML or HTML, one should use a proper parser, and not a single regex. That being said, the following pattern seems to work, at least for the sample data you showed us given the requirements:

<info>.*?(https.*)\]\]

The text captured from the above are the Google URLs appearing after the <info> tag and before the double closing brackets of the CDATA clause.

Demo

Upvotes: 1

Related Questions