Reputation: 10849
I'm getting raw HTML data back from a service, and need to extract a URL from the string. Specifically there is a section of the HTML where the URL string exists, and it is a parameter called 'data-url'. Is there a way I can extract just the URL immediately following 'data-url'. Here's an example:
let html_str = '<div class="tv-focusable" id="tv_web_answer_source" tabindex="-1" data-url="https://apple.stackexchange.com/questions/323174/does-the-iphone-8-have-any-sort-of-water-resistance-or-waterproof-manufacturing" onclick="onUrlClick(this)">'
I just need to strip out the domain and store it.
Upvotes: 0
Views: 696
Reputation: 7285
You can create a URL
object from a string using new URL(text)
and get the hostname
of that Object. Only thing that remains is choosing how you will extract the url from the html.
Using regex
var html = '<div class="tv-focusable" id="tv_web_answer_source" tabindex="-1" data-url="https://apple.stackexchange.com/questions/323174/does-the-iphone-8-have-any-sort-of-water-resistance-or-waterproof-manufacturing" onclick="onUrlClick(this)">';
console.log(new URL(html.match(/data-url="([^"]*)"/)[1]).hostname);
Using html
var html = '<div class="tv-focusable" id="tv_web_answer_source" tabindex="-1" data-url="https://apple.stackexchange.com/questions/323174/does-the-iphone-8-have-any-sort-of-water-resistance-or-waterproof-manufacturing" onclick="onUrlClick(this)">';
var element = document.createElement("div");
element.innerHTML = html;
var elementWithData = element.querySelector("[data-url]");
if (elementWithData) {
console.log(new URL(elementWithData.getAttribute("data-url")).hostname);
}
I would personally go with the html solution, since if (for unknown reasons) the url contains this text \"
, then the regex will fail (though you could just add that constraint).
Also, if you want ES5 compatibility you should use getAttribute
over dataset
. But this will only matter when using older versions of IE (up to 11)
Upvotes: 3
Reputation: 5148
Just use getAttribute
document.getElementById('tv_web_answer_source').getAttribute('data-url')
Even better, use the dataset
(because the attribute you want start with data-
)
document.getElementById('tv_web_answer_source').dataset.url
https://developer.mozilla.org/fr/docs/Web/API/HTMLElement/dataset
Upvotes: 2
Reputation: 207511
Easiest thing would be to use the DOM to get the information. Set your string of html to a new element, select it, and use dataset to get the value of the attribute.
var div = document.createElement("div")
div.innerHTML = `<div class="tv-focusable" id="tv_web_answer_source" tabindex="-1" data-url="https://apple.stackexchange.com/questions/323174/does-the-iphone-8-have-any-sort-of-water-resistance-or-waterproof-manufacturing" onclick="onUrlClick(this)"></div>`
var str = div.querySelector('[data-url]').dataset.url
var host = new URL(str).hostname
console.log(host, str)
Upvotes: 2