Reputation: 185
I'm trying to scrape one spesific hotel's page for rates for the comming 28 days. I suspect I'm being blocked, but I'm not quite sure.
I get some results, but not all. I've even tried with various user agents, a download_delay of 30, httpcahce enabled etc.
This is my lua-script
function main(splash, args)
splash.private_mode_enabled = false
splash.js_enabled = true
splash.images_enabled = false
assert(splash:go(args.url))
function wait_for(splash, condition)
while not condition() do
splash:wait(20.0)
end
end
wait_for(splash, function()
return splash:evaljs("document.querySelector('ul.availability-table-revamp') != null")
end)
assert(splash:wait(30.0))
splash:set_viewport_full()
return {
html = splash:html(),
}
end
The page I'm crawling is [here][1].
How do I know for sure it's the page blocking me? There's no policy on the hotels pages - but there are (of course) on the engine's main page ...
I do of course have more code to show, but my guess is that the only thing that can remedy this is the lua. But if you want to see more the complete code is here :-)
Sure hope you're smarter than me (I guess I allready know the answer to that though).
Upvotes: 1
Views: 217
Reputation: 799
Sometimes web block the user ip
, try using different proxy servers
, as it is accessible through my system.
Upvotes: 1