StardustGogeta
StardustGogeta

Reputation: 3406

How do I perform clientside web-scraping with Javascript?

How is this done without violating the same-origin policy? I have tried, unsuccessfully, to use a website like http://anyorigin.com/ but it does not work. How do I make this happen?

Upvotes: 0

Views: 315

Answers (1)

StardustGogeta
StardustGogeta

Reputation: 3406

EDIT 4/1/20 - Fixing nonfunctional code:

Well, it turns out that YQL doesn't work anymore, but we won't let that stop us.

It turns out that there is a free service called CloudQuery that we can use to accomplish the same thing quite easily. Unfortunately, there is a very low limit to the number of calls you can make over a period of time.

var myUrl = "https://cloudquery.t9t.io/query?url=https%3A%2F%2Fstackoverflow.com%2Fusers%2F5732397%2Fstardustgogeta&selectors=*:nth-child(2)%20%3E%20*:nth-child(1)%20%3E%20*:nth-child(1)%20%3E%20*:nth-child(1)%20%3E%20*:nth-child(1)%20%3E%20*:nth-child(2)%20%3E%20*%20%3E%20*:nth-child(1)";

fetch(myUrl).then(r => r.json()).then(r => {document.write(r.contents[0].innerText)});

Old answer:

It turns out that Yahoo's YQL (Yahoo Query Language) can use XPath to find elements of a page's HTML online.

Simply include the following in your document:

<div id="a"></div>
<script>
    var yqlCallback = function(data){
    	var rep = data.query.results.div;
        document.getElementById('a').innerHTML = "StardustGogeta's reputation is "+rep+'.';
    };
</script>
<script type='application/javascript' src="https://query.yahooapis.com/v1/public/yql?q=select%20content%20from%20html%20where%20url%3D'http%3A%2F%2Fstackoverflow.com%2Fusers%2F5732397%2Fstardustgogeta'%20and%20xpath%3D'%2F%2Fdiv%5B%40class%3D%22reputation%22%5D'&format=json&env=store%3A%2F%2Fdatatables.org%2Falltableswithkeys&callback=yqlCallback"></script>

Upvotes: 0

Related Questions