Lee Probert
Lee Probert

Reputation: 10849

URL extraction from String in Javascript

I'm getting raw HTML data back from a service, and need to extract a URL from the string. Specifically there is a section of the HTML where the URL string exists, and it is a parameter called 'data-url'. Is there a way I can extract just the URL immediately following 'data-url'. Here's an example:

let html_str = '<div class="tv-focusable" id="tv_web_answer_source" tabindex="-1" data-url="https://apple.stackexchange.com/questions/323174/does-the-iphone-8-have-any-sort-of-water-resistance-or-waterproof-manufacturing" onclick="onUrlClick(this)">'

I just need to strip out the domain and store it.

Upvotes: 0

Views: 696

Answers (4)

nick zoum
nick zoum

Reputation: 7285

You can create a URL object from a string using new URL(text) and get the hostname of that Object. Only thing that remains is choosing how you will extract the url from the html.

Using regex

var html = '<div class="tv-focusable" id="tv_web_answer_source" tabindex="-1" data-url="https://apple.stackexchange.com/questions/323174/does-the-iphone-8-have-any-sort-of-water-resistance-or-waterproof-manufacturing" onclick="onUrlClick(this)">';

console.log(new URL(html.match(/data-url="([^"]*)"/)[1]).hostname);

Using html

var html = '<div class="tv-focusable" id="tv_web_answer_source" tabindex="-1" data-url="https://apple.stackexchange.com/questions/323174/does-the-iphone-8-have-any-sort-of-water-resistance-or-waterproof-manufacturing" onclick="onUrlClick(this)">';

var element = document.createElement("div");
element.innerHTML = html;
var elementWithData = element.querySelector("[data-url]");
if (elementWithData) {
  console.log(new URL(elementWithData.getAttribute("data-url")).hostname);
}

I would personally go with the html solution, since if (for unknown reasons) the url contains this text \", then the regex will fail (though you could just add that constraint).

Also, if you want ES5 compatibility you should use getAttribute over dataset. But this will only matter when using older versions of IE (up to 11)

Upvotes: 3

Arthur
Arthur

Reputation: 5148

Just use getAttribute

document.getElementById('tv_web_answer_source').getAttribute('data-url')

Even better, use the dataset (because the attribute you want start with data-)

document.getElementById('tv_web_answer_source').dataset.url

https://developer.mozilla.org/fr/docs/Web/API/HTMLElement/dataset

Upvotes: 2

epascarello
epascarello

Reputation: 207511

Easiest thing would be to use the DOM to get the information. Set your string of html to a new element, select it, and use dataset to get the value of the attribute.

var div = document.createElement("div")
div.innerHTML = `<div class="tv-focusable" id="tv_web_answer_source" tabindex="-1" data-url="https://apple.stackexchange.com/questions/323174/does-the-iphone-8-have-any-sort-of-water-resistance-or-waterproof-manufacturing" onclick="onUrlClick(this)"></div>`
var str = div.querySelector('[data-url]').dataset.url
var host = new URL(str).hostname
console.log(host, str)

Upvotes: 2

Alan
Alan

Reputation: 976

Maybe use

url = s.split("data-url=|\" ")[1];

Upvotes: 0

Related Questions