shiv shankar
shiv shankar

Reputation: 19

extract URL from string in python

I want to extract a full URL from a string.

My code is:

import re
data = "ahahahttp://www.google.com/a.jpg>hhdhd"
print re.match(r'(ftp|http)://.*\.(jpg|png)$', data)

Output:

None

Expected Output

http://www.google.com/a.jpg

I found so many questions on StackOverflow, but none worked for me. I have seen many posts and this is not a duplicate. Please help me! Thanks.

Upvotes: 0

Views: 4370

Answers (3)

Will
Will

Reputation: 24689

You were close!

Try this instead:

r'(ftp|http)://.*\.(jpg|png)'

You can visualize this here.

I would also make this non-greedy like this:

r'(ftp|http)://.*?\.(jpg|png)'

You can visualize this greedy vs. non-greedy behavior here and here.

By default, .* will match as much text as possible, but you want to match as little text as possible.

Your $ anchors the match at the end of the line, but the end of the URL is not the end of the line, in your example.

Another problem is that you're using re.match() and not re.search(). Using re.match() starts the match at the beginning of the string, and re.search() searches anywhere in the string. See here for more information.

Upvotes: 4

Wang Wei Qiang
Wang Wei Qiang

Reputation: 66

You should use search instead of match.

import re
data = "ahahahttp://www.google.com/a.jpg>hhdhd"
url=re.search('(ftp|http)://.*\.(jpg|png)', data)
if url:
   print url.group(0)

Upvotes: 1

Sai Sriharsha Annepu
Sai Sriharsha Annepu

Reputation: 82

Find the start of the url by using find(http:// , ftp://) . Find the end of url using find(jpg , png). Now get the substring

data = "ahahahttp://www.google.com/a.jpg>hhdhd"
start = data.find('http://')
kk = data[start:]
end = kk.find('.jpg')
print kk[0:end+4]

Upvotes: 0

Related Questions