Reputation: 1187
I am scraping a web page using dryscrape (as I need the javascript rendered parts) and I am using eval_script() to suppress some javascript based error checking on the page. This script that I'm suppressing is basically an onkeyup
listener that I need to avoid as it makes it mandatory for the user to select options from a dropdown only.
This is the eval script -
session.eval_script("$('#input_elem').removeAttr('onblur onclick onkeyup');")
Now the overall scraping takes a much longer time as compared to my other implementation of a page on the same domain which doesn't require any javascript modifications (hence without eval_script()).
I did a bit of profiling using time.time()
to see where the script was slowing and indeed, its taking a long time on the eval_script() step(s). Here are the results -
Starting to access at 0.00997018814087
Visited page https://*****/***.aspx 1.30053019524
First eval script run done 5.97628307343
Second eval script run done 9.61053919792
xpath 1 9.6632771492
xpath 2 9.7702870369
xpath 3 9.90402317047
xpath of button to be clicked 9.91756606102
Button clicked 9.97191905975
Second page visited 10.4508111477
Loop 1 else 10.4525721073
xpath 4 10.5330061913
xpath 5 10.6111950874
xpath 6 10.6918411255
xpath 7 10.7721481323
Range begins 10.8208150864
3
Range ends 13.0008580685
Although when I'm looping through the table elements, it is taking about 2 seconds, but the two eval_script() steps, combined, are taking about 8 seconds. When I do this in the Chrome dev tools console, the same scripts run in an instant. Why is the dryscrape implementation taking so much time?
Upvotes: 0
Views: 387
Reputation: 1187
Using jQuery in eval_script() seems to be the culprit. I was able to reduce the script execution time significantly using plain javascript -
session.eval_script("document.getElementById('input_elem')").removeAttribute('onblur');
I had to use two lines for two separate attribute removal steps on the same element.
These are my profiling logs now -
Starting to access at 0.0151550769806
Visited page https://*****/***.aspx 1.73412919044
First eval script run done 1.77594304085
First eval script part 2 run done 1.81522011757
Second eval script run done 1.85607099533
xpath 1 1.94704914093
xpath 2 2.03846216202
xpath 3 2.13886809349
xpath of button to be clicked 2.26395010948
Button clicked 2.27277112007
Second page visited 3.30618906021
Loop 1 else 3.38708400726
xpath 4 3.46828198433
xpath 5 3.54840707779
xpath 6 3.63034701347
xpath 7 3.7106590271
Range begins 3.75155210495
3
Range ends 5.91926407814
Even now, each step in the range
loop is taking about 0.7 seconds which I'm aiming to reduce further.
Upvotes: 0