Reputation: 87
I have this string:
http://pastebin.com/XXXXXXXhttp://pastebin.com/XXXXXX\r
I need to extract all links in one line which ends with \r. It can contain one link or even five links. I got something like this :
(http[s]*:.*)[\\r|h]
but it returns whole row as one match, any ideas ?
Upvotes: 0
Views: 71
Reputation: 721
You don't need regex for this. Try this:
mylinks = []
with open('yourfile', 'r') as f:
for line in f.readlines():
for link in line.split('http'):
mylinks.append('http'+link)
EDIT: Looks like you just need one string not the whole file. Just run:
mylinks = []
for link in mystring.split('http'):
mylinks.append('http'+link)
Upvotes: 0
Reputation: 785108
You can use this lookahead based regex in findall
:
>>> s='http://pastebin.com/XXXXXXXhttp://pastebin.com/XXXXXX\r'
>>> re.findall(r'https?://.+?(?=https?://|[\r\n]|$)', s)
['http://pastebin.com/XXXXXXX', 'http://pastebin.com/XXXXXX']
(?=http://|[\r\n]|$)
is positive lookahead that asserts next position has http://
or \r
or \n
or line end.
Upvotes: 1
Reputation: 5927
Try this
va = 'http://pastebin.com/XXXXXXXhttp://pastebin.com/XXXXXX\r'
import re
vac = re.findall(r"(?:https?:\/+)([^\r|h]+)",va)
print vac
Upvotes: 0