Oli Smith
Oli Smith

Reputation: 93

removing URL from string using python's re

Using this to try to remove URLs from a string:

text = re.sub(r'https?:\/\/[A-Za-z0-9\.\/]+', '', text)

Unfortunately it works for simple URLs but not for complex ones. So something like http://www.example.com/somestuff.html will be removed but something like http://www.example.com/somestuff.html?query=python etc. will just leave trailing bits behind.

I think I'm at the limits of my re knowledge so any help will be much appreciated. Thx.

Upvotes: 1

Views: 890

Answers (1)

luigigi
luigigi

Reputation: 4215

Try:

r"https?:[^\s]+"

Upvotes: 3

Related Questions