Reputation: 3075
I'm trying to scrap a website but when I try to connect to it using YQL, I get redirected to the homepage of the website instead of the page I'm trying to get content off.
Do anybody know what I could do to prevent my request being redirected or any solution to avoid this issue ?
Here is a like to the request I'm trying to perform and which is failing :
Target site :
http://gticket.imagix.be/os1.aspx
Request in Yahoo Console :
http://query.yahooapis.com/v1/public/yql?q=select%20*%20from%20html%20where%20url%3D%22http%3A%2F%2Fgticket.imagix.be%2Fos1.aspx%22&diagnostics=true
Upvotes: 1
Views: 691
Reputation: 146201
It's not because of yql
, actually it has 302
redirect. If you directly put this url in the browser's address bar or click it, then you can see that it has been redirected to the home page of the site and you can't prevent it.
This is the yql result of the page after redirection.
Update:
Also remember that if a website chooses to block YQL using the robots.txt directive, you won’t be allowed to access it. So a site can reject yql
request if it has been setup in that way and here is an article about blocking yql
.
Upvotes: 1
Reputation: 1400
There is a followRedirects option in YQL which you can use. Check here
Upvotes: 0