Reputation: 33
Could someone convert this PHP regex to Python? I tried it for several times with no success:
function convertLinks($text) {
return preg_replace("/(?:(http:\/\/)|(www\.))(\S+\b\/?)([[:punct:]]*)(\s|$)/i",
"<a href=\"http://$2$3\" rel=\"nofollow\">$1$2$3</a>$4$5", $text);
}
Edit: I found that [:punct:] can be replaced by [!"#$%&'()*+,-./:;<=>?@[\]^_`{|}~], so I tried this:
def convertLinks(text):
pat = re.compile(ur"""(?:(http://)|(www\.))(\S+\b\/?)([!"#$%&'()*+,\-./:;<=>?@[\\\]^_`{|}~]*)(\s|$)""", re.IGNORECASE)
return pat.sub(ur'<a href=\"http://\2\3" rel=\"nofollow\">\1\2\3</a>\4\5', text)
but I received "unmatched group" error for convertLinks(u"Test www.example.com test").
Upvotes: 1
Views: 2596
Reputation: 1121366
The expression uses some features that work differently in Python.
Python doesn't have a [[:punct:]]
character group; I used a POSIX regex reference to expand it.
The expression uses optional groups; matching either http://
or www.
at the start, but then uses both in the replacement. This will fail in Python. Solution: use a replacement function.
So to get the same functionality, you can use:
import re
_link = re.compile(r'(?:(http://)|(www\.))(\S+\b/?)([!"#$%&\'()*+,\-./:;<=>?@[\\\]^_`{|}~]*)(\s|$)', re.I)
def convertLinks(text):
def replace(match):
groups = match.groups()
protocol = groups[0] or '' # may be None
www_lead = groups[1] or '' # may be None
return '<a href="http://{1}{2}" rel="nofollow">{0}{1}{2}</a>{3}{4}'.format(
protocol, www_lead, *groups[2:])
return _link.sub(replace, text)
Demo:
>>> test = 'Some text with www.stackoverflow.com links in them like http://this.too/with/path?'
>>> convertLinks(test)
'Some text with <a href="http://www.stackoverflow.com" rel="nofollow">www.stackoverflow.com</a> links in them like <a href="http://this.too/with/path" rel="nofollow">http://this.too/with/path</a>?'
Upvotes: 2
Reputation: 59974
If you want to use regex in python, you should consider using the re
module. In this example, specifically re.sub
.
The syntax is something similar to:
output = re.sub(regular_expression, what_it_should_be_replaced_by, input)
Don't forget that re.sub()
returns the substituted string.
Upvotes: 0