Extracting a URL from a string in Python

Question

I have a string like

TF-AIDN, "Proposal for something...", Version 3.4, 18 November 2015 https://www.something.org/en/system/files/files/file-18nov15-en.pdf

How can I modify the following statement to extract URL from such a string?

urlfinder = re.compile(r"((https?):((//)|(\\))+[\w\d:#@%/;$()~_?\+-=\\.&]*)", re.MULTILINE|re.UNICODE)

I am not able to figure out how can I modify the regular expression so that it takes < as the end mark of a URL instead of a space.

Federico Piazza · Accepted Answer

You can use this regex instead:

(http[^<]+)

This will match a pattern having http and everything but <

Answers (1)