Regular expression to find the first occurrence of string from a point of another string

Question

My text is

 <  AAAA OR    ” https://www.google.com”> AAAA OR    ” https://www.google.com”> AAAA OR   ” https://www.google.com”> AAAA’]]>

I need to find the text from the first "https" till "]]", and I was able to do it like:

(?=https).*?(?=\]\])

But what if I have to find the "info" text from there find the first "https" till "]]"?

And is there a way to remove any character between the text? If suppose I am getting the text between "https" to "]]" and I have to remove all the "OR" from my result string?

So my final result from regex will look like

https://www.google.com”> AAAA     ” https://www.google.com”> AAAA     ” https://www.google.com”> AAAA    ” https://www.google.com”> AAAA’

How to do it with the single regex?

Tim Biegeleisen · Accepted Answer

In general, when parsing nested content like XML or HTML, one should use a proper parser, and not a single regex. That being said, the following pattern seems to work, at least for the sample data you showed us given the requirements:

.*?(https.*)\]\]

The text captured from the above are the Google URLs appearing after the tag and before the double closing brackets of the CDATA clause.

Regular expression to find the first occurrence of string from a point of another string

Answers (1)

Demo

Related Questions