Reputation: 99
I want to reverse engineering the contents generated by scrolling down in the webpage. The problem is in the url https://www.crowdfunder.com/user/following_page/80159?user_id=80159&limit=0&per_page=20&screwrand=933
. screwrand
doesn't seem to follow any pattern, so the reversing the urls don't work. I'm considering the automatic rendering using Splash. How to use Splash to scroll like browsers? Thanks a lot!
Here are the codes for two request:
request1 = scrapy_splash.SplashRequest(
'https://www.crowdfunder.com/user/following/{}'.format(user_id),
self.parse_follow_relationship,
args={'wait':2},
meta={'user_id':user_id, 'action':'following'},
endpoint='http://192.168.99.100:8050/render.html')
yield request1
request2 = scrapy_splash.SplashRequest(
'https://www.crowdfunder.com/user/following_user/80159?user_id=80159&limit=0&per_page=20&screwrand=76',
self.parse_tmp,
meta={'user_id':user_id, 'action':'following'},
endpoint='http://192.168.99.100:8050/render.html')
yield request2
ajax request shown in browser console
Upvotes: 8
Views: 7877
Reputation: 3
I do not think that setting the number of scrolls hard coded is a good idea for infinite scroll pages, so I modified the above-mentioned code like this:
function main(splash, args)
current_scroll = 0
scroll_to = splash:jsfunc("window.scrollTo")
get_body_height = splash:jsfunc(
"function() {return document.body.scrollHeight;}"
)
assert(splash:go(splash.args.url))
splash:wait(3)
height = get_body_height()
while current_scroll < height do
scroll_to(0, get_body_height())
splash:wait(5)
current_scroll = height
height = get_body_height()
end
splash:set_viewport_full()
return splash:html()
end
Upvotes: 0
Reputation: 22238
To scroll a page you can write a custom rendering script (see http://splash.readthedocs.io/en/stable/scripting-tutorial.html), something like this:
function main(splash)
local num_scrolls = 10
local scroll_delay = 1.0
local scroll_to = splash:jsfunc("window.scrollTo")
local get_body_height = splash:jsfunc(
"function() {return document.body.scrollHeight;}"
)
assert(splash:go(splash.args.url))
splash:wait(splash.args.wait)
for _ = 1, num_scrolls do
scroll_to(0, get_body_height())
splash:wait(scroll_delay)
end
return splash:html()
end
To render this script use 'execute' endpoint instead of render.html endpoint:
script = """<Lua script> """
scrapy_splash.SplashRequest(url, self.parse,
endpoint='execute',
args={'wait':2, 'lua_source': script}, ...)
Upvotes: 21
Reputation: 220
Thanks Mikhail, I tried your scroll script, and it worked, but I also notice that your script scroll too much one time, some js have no time too render and is skipped, so I do some little change as follow:
function main(splash)
local num_scrolls = 10
local scroll_delay = 1
local scroll_to = splash:jsfunc("window.scrollTo")
local get_body_height = splash:jsfunc(
"function() {return document.body.scrollHeight;}"
)
assert(splash:go(splash.args.url))
splash:wait(splash.args.wait)
for _ = 1, num_scrolls do
local height = get_body_height()
for i = 1, 10 do
scroll_to(0, height * i/10)
splash:wait(scroll_delay/10)
end
end
return splash:html()
end
Upvotes: 4