benjibot
benjibot

Reputation: 21

How do I get splash to click to click through next page using JS path?

I'm following a tutorial on using Splash to extract data from a table on a javascript website. The code keeps scraping the main page instead of clicking through to the next page, so I end up with 10 repeats of the same page. I've tried changing the button JS path, but same results.

Anyone know how where I'm going wrong? Here is the URL I'm scraping: https://eservices.customs.gov.hk/MSOS/wsrh/001s1?searchBy=ALL

Here is the Lua Code from Splash:

function main(splash, args)
  assert(splash:go(args.url))
  assert(splash:wait(0.5))
  treat=require('treat')
  result= {}
  for i=1,9,1
  do
    assert(splash:runjs('document.querySelector("#next_grid-table-pubSrch > span").click()'))
    result[i]=splash.html()
  end
  return treat.as_array(result)

end

Upvotes: 0

Views: 441

Answers (1)

benjibot
benjibot

Reputation: 21

Turns out I just needed to remove the span tag. Here is the updated script for those who may have similar problem. I hit a 504 error around page around page 99/205, so will have to work that out. Will update when I solve this, no need to reply as you'll need to have my scrapy code for that. This is just for educational viewing now.

function main(splash, args)
  assert(splash:go(args.url))
  assert(splash:wait(0.3))
  treat=require('treat')
  result= {}
  for i=1,205,1
  do
    assert(splash:runjs('document.querySelector("#next_grid-table-pubSrch").click()'))
    assert(splash:wait(0.3))
    result[i]=splash:html()
  end
  return treat.as_array(result)
end

Upvotes: 2

Related Questions