Extracting links from XML file using python

Question

I have a sitemap XML file and I want to run a script which extracts all the urls and print it. I have tried re.findall(r'(https?://\S+)', url)

I don't want to print the suffix ' /liv ' how do I implement this using regex ?

wizzwizz4 · Accepted Answer

Are all of the URLs wrapped in quotation marks or surrounded by spaces? If so, you could do something like:

re.findall(r'(?P.)(https?://\S+?)(?P=quote)', url)

If you're getting the string representation of everything matched, instead of just the second group, you'll have to trim it with ...[1:-1].

Answers (1)