Reputation: 3479
How can I parse text and find all instances of hyperlinks with a string? The hyperlink will not be in the html format of <a href="http://test.com">test</a>
but just http://test.com
Secondly, I would like to then convert the original string and replace all instances of hyperlinks into clickable html hyperlinks.
I found an example in this thread:
Easiest way to convert a URL to a hyperlink in a C# string?
but was unable to reproduce it in python :(
Upvotes: 15
Views: 25030
Reputation: 371
Have a look at urlextract.
You can install it running: pip install urlextract
from urlextract import URLExtract
extractor = URLExtract()
urls = extractor.find_urls("Text with URLs. Let's have URL janlipovsky.cz as an example.")
print(urls) # prints: ['janlipovsky.cz']
The main advantage is that urlextract
will find URLs without specifying schema (http, ftp, etc.) It also has a lot of configuration options to tune in the extractor to fit your needs. Everything can be found in documentation.
Upvotes: 2
Reputation: 21469
Here is a much more sophisticated regexp from 2002.
@yoniLavi minified this to:
re.compile(r'\b(?:https?|telnet|gopher|file|wais|ftp):[\w/#~:.?+=&%@!\-.:?\\-]+?(?=[.:?\-]*(?:[^\w/#~:.?+=&%@!\-.:?\-]|$))')
Upvotes: 10
Reputation: 28250
Django also has a solution that doesn't just use regex. It is django.utils.html.urlize(). I found this to be very helpful, especially if you happen to be using django.
You can also extract the code to use in your own project.
Upvotes: 5
Reputation: 11337
Here's a Python port of Easiest way to convert a URL to a hyperlink in a C# string?:
import re
myString = "This is my tweet check it out http://tinyurl.com/blah"
r = re.compile(r"(http://[^ ]+)")
print r.sub(r'<a href="\1">\1</a>', myString)
Output:
This is my tweet check it out <a href="http://tinyurl.com/blah">http://tinyurl.com/blah</a>
Upvotes: 23