Reputation: 21
I'm following a tutorial on using Splash to extract data from a table on a javascript website. The code keeps scraping the main page instead of clicking through to the next page, so I end up with 10 repeats of the same page. I've tried changing the button JS path, but same results.
Anyone know how where I'm going wrong? Here is the URL I'm scraping: https://eservices.customs.gov.hk/MSOS/wsrh/001s1?searchBy=ALL
Here is the Lua Code from Splash:
function main(splash, args)
assert(splash:go(args.url))
assert(splash:wait(0.5))
treat=require('treat')
result= {}
for i=1,9,1
do
assert(splash:runjs('document.querySelector("#next_grid-table-pubSrch > span").click()'))
result[i]=splash.html()
end
return treat.as_array(result)
end
Upvotes: 0
Views: 441
Reputation: 21
Turns out I just needed to remove the span tag. Here is the updated script for those who may have similar problem. I hit a 504 error around page around page 99/205, so will have to work that out. Will update when I solve this, no need to reply as you'll need to have my scrapy code for that. This is just for educational viewing now.
function main(splash, args)
assert(splash:go(args.url))
assert(splash:wait(0.3))
treat=require('treat')
result= {}
for i=1,205,1
do
assert(splash:runjs('document.querySelector("#next_grid-table-pubSrch").click()'))
assert(splash:wait(0.3))
result[i]=splash:html()
end
return treat.as_array(result)
end
Upvotes: 2