user1094081
user1094081

Reputation:

Extract links in a string and return an array of objects

I receive a string from a server and this string contains text and links (mainly starting with http://, https:// and www., very rarely different but if they are different they don't matter).

Example:

"simple text simple text simple text domain.ext/subdir again text text text youbank.com/transfertomealltheirmoney/witharegex text text text and again text"

I need a JS function that does the following: - finds all the links (no matter if there are duplicates); - returns an array of objects, each representing a link, together with keys that return where the link starts in the text and where it ends, something like:

[{link:"http://www.dom.ext/dir",startsAt:25,endsAt:47},
{link:"https://www.dom2.ext/dir/subdir",startsAt:57,endsAt:88},
{link:"www.dom.ext/dir",startsAt:176,endsAt:192}]

Is this possible? How?

EDIT: @Touffy: I tried this but I could not get how long is any string, only the starting index. Moreover, this does not detect www: var str = string with many links (SO does not let me post them)" var regex =/(\b(https?|ftp|file|www):\/\/[-A-Z0-9+&@#\/%?=~_|!:,.;]*[-A-Z0-9+&@#\/%=~_|])/ig; var result, indices = []; while ( (result = regex.exec(str)) ) { indices.push({startsAt:result.index}); }; console.log(indices[0].link);console.log(indices[1].link);

Upvotes: 5

Views: 6233

Answers (2)

Femi
Femi

Reputation: 1

const getLinksPool = (links) => { //you can replace the https with any links like http or www

const linksplit = links.replace(/https:/g, " https:");
let linksarray = linksplit.split(" ");
let linkspools = linksarray.filter((array) => {
return array !== "";
});

return linkspools;

};

Upvotes: 0

AmmarCSE
AmmarCSE

Reputation: 30557

One way to approach this would be with the use of regular expressions. Assuming whatever input, you can do something like

 var expression = /(https?:\/\/(?:www\.|(?!www))[^\s\.]+\.[^\s]{2,}|www\.[^\s]+\.[^\s]{2,})/gi;
 var matches = input.match(expression);

Then, you can iterate through the matches to discover there starting and ending points with the use of indexOf

for(match in matches)
    {
        var result = {};
        result['link'] = matches[match];
        result['startsAt'] = input.indexOf(matches[match]);
        result['endsAt'] = 
            input.indexOf(matches[match]) + matches[match].length;
     }

Of course, you may have to tinker with the regular expression itself to suit your specific needs.

You can see the results logged by console in this fiddle

Upvotes: 14

Related Questions