Reputation: 433
My text is
< <footnotes><footnote><info><![CDATA[Some text ‘ ”https://www.google.com”> AAAA OR ” https://www.google.com”> AAAA OR ” https://www.google.com”> AAAA OR ” https://www.google.com”> AAAA’]]></info></footnote></footnotes><resources></resources>
I need to find the text from the first "https"
till "]]"
, and I was able to do it like:
(?=https).*?(?=\]\])
But what if I have to find the "info"
text from there find the first "https"
till "]]"
?
And is there a way to remove any character between the text? If suppose I am getting the text between "https"
to "]]"
and I have to remove all the "OR"
from my result string?
So my final result from regex will look like
https://www.google.com”> AAAA ” https://www.google.com”> AAAA ” https://www.google.com”> AAAA ” https://www.google.com”> AAAA’
How to do it with the single regex?
Upvotes: 0
Views: 84
Reputation: 522626
In general, when parsing nested content like XML or HTML, one should use a proper parser, and not a single regex. That being said, the following pattern seems to work, at least for the sample data you showed us given the requirements:
<info>.*?(https.*)\]\]
The text captured from the above are the Google URLs appearing after the <info>
tag and before the double closing brackets of the CDATA clause.
Upvotes: 1