Siddharth Thakor
Siddharth Thakor

Reputation: 156

Regex for finding link

I have an issue related to finding a regex for the link with some conditions. Here is the scenario:

I have created utils.ts it's a typescript. basically, it will take an API response as an input and return the formatted HTML supported text, like bold text, email, Images, Links.

So let's take one scenario which I am facing.

as a return of the utils.ts file, I am getting this.

https://www.google.com <a href="https://www.youtube.ca" target="_blank">Click here</a> (Note: normal links and 'a' tag links can occure in any order)

from the above text, as you can see this part <a href="https://www.youtube.ca" target="_blank">Click here</a> is already in HTML supported method. So I will get the following output on GUI

https://www.google.com Click here

so from this point, I want a regex which can format https://www.google.com but it must not manipulate <a href="https://www.youtube.ca" target="_blank">Click here</a> as it is already formated.

Here I also want to format https:///www.google.com as follow

Google

The main problem I am facing is when I am replacing the string with 'https://..' with tags it will also replace the links inside 'href' like this

<a href="https://www.google.com">Google</a> <a href="<a href="https://www.youtube.com">Google</a>">Click me</a>

Which is what I don't want.

Please share your thought on this.

Thank you

Upvotes: 0

Views: 105

Answers (2)

Slawomir Dziuba
Slawomir Dziuba

Reputation: 1325

If I understood correctly, you want to extract from the text those web addresses that appear in the text and are not links. If so check out the following javascript:

    //the data:
    var txt1='https://www.google.com <a href="https://www.youtube.ca" target="_blank">Click here</a> http://other.domain.com';
    
    // strip html tags
    String.prototype.stripHTML = function () {
        var reTag = /<(?:.|\s)*?>/g;
        return this.replace(reTag, " ");
    };
    var txt2=txt1.stripHTML();
    //console.log(txt2); 
    
    //split tokens
    var regex1 = /\s/;
    var tokens = txt2.split(regex1); 
    //console.log(tokens);

    //build an address table
    regex2=/^https?:\/\/.*/;
    var i=0, j=0; 
    var addresses=[];
    for (i in tokens) {
        if (regex2.test(tokens[i])) {
            addresses[j] = tokens[i];
            j++;
        } 
        i++;
    }
    console.log(addresses);

Upvotes: 2

AndreyCh
AndreyCh

Reputation: 1403

Not yet formatted links can be found using alternations. The idea is - if a link is formatted it's not captured to a group (don't be confused that the regex still finds something - you should only look at Group 1). Otherwise, the link is captured to a group.

The regex below is really simple, just to explain the idea. You might want to update it with a better URL search pattern.

demo

(?:href="https?\S+")|(https?\S+)

Upvotes: 2

Related Questions