Reputation: 156
I have an issue related to finding a regex for the link with some conditions. Here is the scenario:
I have created utils.ts it's a typescript. basically, it will take an API response as an input and return the formatted HTML supported text, like bold text, email, Images, Links.
So let's take one scenario which I am facing.
as a return of the utils.ts file, I am getting this.
https://www.google.com <a href="https://www.youtube.ca" target="_blank">Click here</a>
(Note: normal links and 'a' tag links can occure in any order)
from the above text, as you can see this part <a href="https://www.youtube.ca" target="_blank">Click here</a>
is already in HTML supported method.
So I will get the following output on GUI
https://www.google.com Click here
so from this point, I want a regex which can format https://www.google.com
but it must not manipulate <a href="https://www.youtube.ca" target="_blank">Click here</a>
as it is already formated.
Here I also want to format https:///www.google.com
as follow
The main problem I am facing is when I am replacing the string with 'https://..' with tags it will also replace the links inside 'href' like this
<a href="https://www.google.com">Google</a> <a href="<a href="https://www.youtube.com">Google</a>">Click me</a>
Which is what I don't want.
Please share your thought on this.
Thank you
Upvotes: 0
Views: 105
Reputation: 1325
If I understood correctly, you want to extract from the text those web addresses that appear in the text and are not links. If so check out the following javascript:
//the data:
var txt1='https://www.google.com <a href="https://www.youtube.ca" target="_blank">Click here</a> http://other.domain.com';
// strip html tags
String.prototype.stripHTML = function () {
var reTag = /<(?:.|\s)*?>/g;
return this.replace(reTag, " ");
};
var txt2=txt1.stripHTML();
//console.log(txt2);
//split tokens
var regex1 = /\s/;
var tokens = txt2.split(regex1);
//console.log(tokens);
//build an address table
regex2=/^https?:\/\/.*/;
var i=0, j=0;
var addresses=[];
for (i in tokens) {
if (regex2.test(tokens[i])) {
addresses[j] = tokens[i];
j++;
}
i++;
}
console.log(addresses);
Upvotes: 2
Reputation: 1403
Not yet formatted links can be found using alternations. The idea is - if a link is formatted it's not captured to a group (don't be confused that the regex still finds something - you should only look at Group 1). Otherwise, the link is captured to a group.
The regex below is really simple, just to explain the idea. You might want to update it with a better URL search pattern.
(?:href="https?\S+")|(https?\S+)
Upvotes: 2