Reputation: 308
I wish to create a script that can interact with any webpage with as little delay as possible, excepting network delay. This is to be used for arbitrage trading so any tips in this direction are welcome.
My current approach has been to use Selenium with Python as I am a beginner in web scraping. However, I have investigated some approaches and from my findings it seems that Selenium has a considerable delay (See benchmark results below) I mention other things besides it in this question but the focus is on Selenium.
In general, I need to send a BUY or SELL request to whatever broker I am currently working with. Using the web browser client, I have found a few approaches to do this:
1. Actually clicking the button
a. Use Selenium to Click the button with the corresponding request
b. Use Puppeteer to Click
c. Use Pynput or other direct mouse input manipulation
2. Reverse-engineering the request and sending it directly without clicking any buttons
Now, the delay in sending this request needs to be minimized as much as possible. Network delay is out of our control for the purpose of this optimization.
I have benchmarked the approaches for 1.
by opening a page in the standard way you would, either with puppeteer and selenium then waiting for a few seconds. While the script waits, I injected into the browser the following code:
$x('//*[@id="id_demo_webfront"]/iframe')[0].contentDocument.body.addEventListener('click', (data => console.log(new Date().getTime())), true);
The purpose of this code is to log in console the current time when the click is registered by the browser.
Then, I log in my python(Selenium, pynput
)/javascript(Puppeteer
) script the current time right before issuing the click. I am running on Ubunutu 18.04 as opposed to Windows so my system clock should have good resolution. Additional reading about this here and here.
Actual results, all were run on the same webpage multiple times, for roughly 10 clicks each run:
1. a. ~80ms
1. b. ~10-30ms
1. c. ~5-10ms
For 2.
, I haven't reliably tested the delay. The idea behind this approach is to inject a function that when fired will send a request that exactly resembles a request that would be sent when the button is clicked. I have not tested the delay between issuing the command to run such an injected function and it actually being run, however I expect this approach to be even faster, basically creating my own client to interact with whatever API the broker has on the backend.
From the results, it is pretty clear that issuing mouse commands seems to be the quickest, but it also is the hardest for me to implement reliably. Also, I seem to find puppeteer running faster across the board, but I prefer selenium in python for ease of development and was wondering if there are any tips and ideas to speed up those delays I am experiencing.
Why does Selenium with Python have such a delay to issue commands and can it be improved?
Why does Puppeteer seem to have a lower delay when interacting with the same browser?
Some code snippets of my approach:
class DMMPageInterface:
# These are part of the interface script, self.page is the webpage after initialization
def __init__(self, config):
self.bid_price = self.page.get_element('xpath goes here')
...
# Mostly only the speed of this operation matters, sending the trade request
def bid_click(self):
logger.debug("Clicking bid")
if USE_MOUSE:
mouse.position = (720,390)
mouse.click(Button.left, 1)
else:
self.bid_price.click()
Upvotes: 1
Views: 947
Reputation: 3743
Quite a few questions there!
"Why does Selenium with Python have such a delay to issue commands"
I don't have any direct referencable evidence for this but the selenium delay is probably down to the amount of work it has to do. It was created for testing not for performance. The difference in that is that the most valuable part of a test is that MUST run reliably, if it's slower and does more checks in order to make it more reliable than so be it. You still get a massive performance gain over manual testing, and that's what we're replacing.
This is where you'll need the reliability - and i see you mention this need in your question too. It's great if you have a trading script that can complete an action in <1s - but what if it fails 5% of the time? How much will that cause problems?
Selenium also needs to render the GUI then, depending on how you identify your objects it needs to scan the entire DOM for what you want. Generic locators will logically take longer.
Selenium's greater power is the simplicity and the provided ability synchronisation. There's lots of talk in lots of SO articles about webdriverwait (which is great for in-page script sync), but there's also good recognition of page load wait-times. i.e. no point pushing the button before all the source is loaded.
"Why does Puppeteer seem to have a lower delay when interacting with the same browser?" Puppeteer works with chrome devtools - selenium works with the DOM. So the two are interacting in a different way. Obviously, puppeteer is only for chrome, but selenium can go to almost any browser. It might not be a problem for you if you don't care about cross-browser capability.
However - how much faster is it really?
In your execution timings, you might also want to factor in the whole life cycle of what you aim to do - namely, include: browser load time, and page render time. From your story, you want to buy or sell something. So, you kick off the script - how long does it take from pressing GO to the end result. You most likely will need synchronisation and that's where puppeteer might not be as strong (or take you much longer to get a handle on).
You say you want to "interact with any webpage with as little delay as possible", but you have to get to the page first and get it in the right state. I wouldn't be concerned about milliseconds per action. Kicking open chrome isn't the quickest action in itself and your overall millisecond test becomes >seconds to minutes.
When you have full selenium script taking 65 seconds and a puppeteer script taking 64 seconds, the overall consideration changes :-)
Some other options to think about if you really need speed:
If it was me, and it was about a single performant action - my first port of call would be doing it at the API level. There's lots of resources and practice sites out there to help get you started and there's no browser to render, there's barely anything more than the network delay. That's the real way to keep it as a sub-second script.
Final thought, when you ask "can it be improved": if you have some code, there may be other ways to make it faster. Share your script so far and i'm sure folks will love to chip in their 2 cents on how you can streamline it :-)
Upvotes: 1