Reputation: 2738
I have a string with many urls to some pages and images:
La-la-la https://example.com/ la-la-la https://example.com/example.PNG
And I need to convert it to:
La-la-la <a href="https://example.com/">https://example.com/</a> la-la-la <img src="https://example.com/example.PNG">
Image formats are unpredictable, they can be .png
.JPEG
etc., and any links can be found multiple times per string
I understand, that there are some strange javascript examples here, but I can not get how to convert them to python.
But I found this as a starting point:
url_regex = /(\b(https?|ftp|file):\/\/[-A-Z0-9+&@#\/%?=~_|!:,.;]*[-A-Z0-9+&@#\/%=~_|])/ig
img_regex = /^ftp|http|https?:\/\/(?:[a-z\-]+\.)+[a-z]{2,6}(?:\/[^\/#?]+)+\.(?:jpe?g|gif|png)$/ig
Big thx for help
Upvotes: 2
Views: 619
Reputation: 26094
You may use the following regular expression:
(https?.*?\.com\/)(\s+[\w-]*\s+)(https?.*?\.com\/[\w\.]+)
(https?.*?\.com\/)
First capture group. Capture http
or https
, anything up to .com
and forward slash /
.(\s+[\w-]*\s+)
Second capture group. Capture whitespace, alphanumerical characters and hypens, and whitespace. You can add more characters to the character set if needed.(https?.*?\.com\/[\w\.]+)
Third capture group. Capture http
or https
, anything up to .com
, forward slash /
, alphanumerical characters and full stop .
for the extension. Again you can add more characters to the character set in this capture group if you are expecting other characters.You can test the regex live here.
Alternatively, if you are expecting variable urls and domains you may use:
(\w*\:.*?\.\w*\/)(\s+[\w-]*\s+)(\w*\:?.*?\.\w*\/[\w\.]+)
Where first and third capture groups now do match any alphanumerical characters followed by colon :
, and anything up to a .
, alphanumerical characters \w
and forward slash. You can test this here.
You may replace captured groups with:
<a href="\1">\1</a>\2<img src="\3">
Where \1
, \2
, and \3
are backreferences to captured groups one, two and three respectively.
Python snippet:
>>import re
>>str = "La-la-la https://example.com/ la-la-la https://example.com/example.PNG"
>>out = re.sub(r'(https?.*?\.com\/)(\s+[\w-]*\s+)(https?.*?\.com\/[\w\.]+)',
r'<a href="\1">\1</a>\2<img src="\3">',
str)
>>print(out)
La-la-la <a href="https://example.com/">https://example.com/</a> la-la-la <img src="https://example.com/example.PNG">
Upvotes: 1
Reputation: 7412
You can do this without regex
, if you want.
stng = 'La-la-la https://example.com/ la-la-la https://example.com/example.PNG'
sentance = '{f_txt} <a href="{f_url}">{f_url}</a> {s_txt} <img src="{s_url}">'
f_txt, f_url, s_txt, s_url = stng.split()
print(sentance.format(f_txt=f_txt, f_url=f_url, s_txt=s_txt, s_url=s_url))
Output
La-la-la <a href="https://example.com/">https://example.com/</a> la-la-la <img src="https://example.com/example.PNG">
Upvotes: 1