user7623610
user7623610

Reputation: 97

Extracting a URL from a string in Python

I have a string like

<dd>TF-AIDN, "Proposal for something...", Version 3.4, 18 November 2015 https://www.something.org/en/system/files/files/file-18nov15-en.pdf</dd>  

How can I modify the following statement to extract URL from such a string?

urlfinder = re.compile(r"((https?):((//)|(\\\\))+[\w\d:#@%/;$()~_?\+-=\\\.&]*)", re.MULTILINE|re.UNICODE)

I am not able to figure out how can I modify the regular expression so that it takes < as the end mark of a URL instead of a space.

Upvotes: 1

Views: 378

Answers (1)

Federico Piazza
Federico Piazza

Reputation: 31035

You can use this regex instead:

(http[^<]+)

Working demo

This will match a pattern having http and everything but <

Upvotes: 2

Related Questions