Reputation: 19
I want to extract a full URL from a string.
My code is:
import re
data = "ahahahttp://www.google.com/a.jpg>hhdhd"
print re.match(r'(ftp|http)://.*\.(jpg|png)$', data)
Output:
None
Expected Output
http://www.google.com/a.jpg
I found so many questions on StackOverflow, but none worked for me. I have seen many posts and this is not a duplicate. Please help me! Thanks.
Upvotes: 0
Views: 4370
Reputation: 24689
You were close!
Try this instead:
r'(ftp|http)://.*\.(jpg|png)'
You can visualize this here.
I would also make this non-greedy like this:
r'(ftp|http)://.*?\.(jpg|png)'
You can visualize this greedy vs. non-greedy behavior here and here.
By default, .*
will match as much text as possible, but you want to match as little text as possible.
Your $
anchors the match at the end of the line, but the end of the URL is not the end of the line, in your example.
Another problem is that you're using re.match()
and not re.search()
. Using re.match()
starts the match at the beginning of the string, and re.search()
searches anywhere in the string. See here for more information.
Upvotes: 4
Reputation: 66
You should use search instead of match.
import re
data = "ahahahttp://www.google.com/a.jpg>hhdhd"
url=re.search('(ftp|http)://.*\.(jpg|png)', data)
if url:
print url.group(0)
Upvotes: 1
Reputation: 82
Find the start of the url by using find(http:// , ftp://) . Find the end of url using find(jpg , png). Now get the substring
data = "ahahahttp://www.google.com/a.jpg>hhdhd"
start = data.find('http://')
kk = data[start:]
end = kk.find('.jpg')
print kk[0:end+4]
Upvotes: 0