Jonas__G
Jonas__G

Reputation: 185

Scrapy Splash results in 504

I'm trying to scrape one spesific hotel's page for rates for the comming 28 days. I suspect I'm being blocked, but I'm not quite sure.

I get some results, but not all. I've even tried with various user agents, a download_delay of 30, httpcahce enabled etc.

This is my lua-script

    function main(splash, args)
      splash.private_mode_enabled = false
      splash.js_enabled = true
      splash.images_enabled = false
      assert(splash:go(args.url))       
      function wait_for(splash, condition)
        while not condition() do
            splash:wait(20.0)
      end
      end

      wait_for(splash, function()
        return splash:evaljs("document.querySelector('ul.availability-table-revamp') != null")
      end)

      assert(splash:wait(30.0))
      splash:set_viewport_full()
      return {
        html = splash:html(),
      }
    end

The page I'm crawling is [here][1].

How do I know for sure it's the page blocking me? There's no policy on the hotels pages - but there are (of course) on the engine's main page ...

I do of course have more code to show, but my guess is that the only thing that can remedy this is the lua. But if you want to see more the complete code is here :-)

Sure hope you're smarter than me (I guess I allready know the answer to that though).

Upvotes: 1

Views: 217

Answers (1)

ThunderMind
ThunderMind

Reputation: 799

Sometimes web block the user ip, try using different proxy servers, as it is accessible through my system.

Upvotes: 1

Related Questions