CODEWITHSUNDEEP

Reputation: 87

Regex to find links in one row

I have this string:

http://pastebin.com/XXXXXXXhttp://pastebin.com/XXXXXX\r

I need to extract all links in one line which ends with \r. It can contain one link or even five links. I got something like this :

(http[s]*:.*)[\\r|h]

but it returns whole row as one match, any ideas ?

Upvotes: 0

Views: 71

Answers (4)

Reputation: 721

You don't need regex for this. Try this:

mylinks = []
with open('yourfile', 'r') as f:
    for line in f.readlines():
        for link in line.split('http'):
            mylinks.append('http'+link)

EDIT: Looks like you just need one string not the whole file. Just run:

mylinks = []
for link in mystring.split('http'):
    mylinks.append('http'+link)

Upvotes: 0

Reputation: 786289

You can use this lookahead based regex in findall:

>>> s='http://pastebin.com/XXXXXXXhttp://pastebin.com/XXXXXX\r'
>>> re.findall(r'https?://.+?(?=https?://|[\r\n]|$)', s)
['http://pastebin.com/XXXXXXX', 'http://pastebin.com/XXXXXX']

(?=http://|[\r\n]|$) is positive lookahead that asserts next position has http:// or \r or \n or line end.

Upvotes: 1

Reputation: 5542

Give this a try: (https?:\/\/[^\\r|h]+)

Upvotes: 0

mkHun

Reputation: 5921

Try this

va = 'http://pastebin.com/XXXXXXXhttp://pastebin.com/XXXXXX\r'
import re
vac = re.findall(r"(?:https?:\/+)([^\r|h]+)",va)
print vac

Upvotes: 0

Related Questions