Reputation: 35
Hi i at the momment try to parse some HTML news for our new fan page. Caus the company do not offer a RSS Feed.
I got a new JS File with that included
function getNews() {
y = 0;
news = new Array(7);
news_content = new Array(5);
for (var i = 0; i < news.length; i++)
{
var table = document.getElementById('news').contentWindow.getElementsByTagName('table')[y];
news_content[0] = table.rows[0].cells[0].getElementsByTagName('img')[0].src;
news_content[1] = table.rows[0].cells[1].getElementsByTagName('span')[0].innerHTML;
news_content[2] = table.rows[0].cells[2].getElementsByTagName('span')[0].innerHTML;
news_content[3] = table.rows[1].cells[0].getElementsByTagName('p')[0].innerHTML;
news_content[4] = table.rows[0].cells[0].getElementsByTagName('a')[0].href;
//alert(news[0] + "\n" + news[1] + "\n" + news[2] + "\n" + news[3] + "\n" + news[4]);
news[i] = news_content[0] + "\n" + news_content[1] + "\n" + news_content[2] + "\n" + news_content[3] + "\n" + news_content[4] + "\n";
y = y + 2;
}
alert (news[0] + "\n" + news[1] + "\n" + news[2] + "\n" + news[3] + "\n" + news[4])
}
and that html
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>Unbenanntes Dokument</title>
<script src="test.js"></script>
</head>
<body>
<a href="page.html" onclick="getNews()">Hier klicken</a>
<iframe id="news" src="http://www.aerosoft-shop.com/list_news.php?cat=fs&lang=de">
</body>
</html>
At last if i pase the source code into the html file it works but is there no way to parse from a external page?
Upvotes: 0
Views: 2099
Reputation: 66
If you debug your code with a tool like Firebug, a errormessage would be returned like this:
Permission denied to access property 'getElementsByTagName'
It's indeed not possible in JavaScript to access a IFrame which points to a different domain, not even subdomain of your domain (according to the comment on this answer it is possible).
The question here is, if the site-owner wants you do crawl his site off or at least gave you an okay for it, because its generally not that welcomed to get crawled from other sources (traffic and maybe copyright problems).
Upvotes: 1