Selenium Grid WebDriver returning 504 Gateway Time-out while waiting for grid to scale

Question

Currently I have a Selenium Grid running on AWS Fargate that autoscales based on desired sessions on the hub. I have a service that runs the hub task and a service for the node tasks. I currently use a one session per node approach because of the necessary resources required, and the fact that overall execution speed is not the primary goal of this test suite. I also always keep at least one node running.

The actual autoscaling will work; the hub sees it needs more nodes and scales the node service up to the needed scale. The hub will hold the session until a node is available and correctly place it there when it is.

The tests work perfectly if I'm just running one at a time, but the problem I'm running into is that when I try to run a group in parallel and need the grid to scale up, I get a 504 Gateway Time-out after 30 seconds of calling the WebDriver. I've tried to change every setting possible to bump this timeout but to no avail.

My hub config looks like

browserTimeout : 0
debug : false
jettyMaxThreads : -1
host : XXXXXXXXX
port : 4444
role : hub
timeout : 180000
cleanUpCycle : 5000
maxSession : 5
hubConfig : /opt/selenium/config.json
capabilityMatcher : org.openqa.grid.internal.utils.DefaultCapabilityMatcher
newSessionWaitTimeout : -1
throwOnCapabilityNotPresent : true
registry : org.openqa.grid.internal.DefaultGridRegistry

The node config looks like

browserTimeout: 0
debug: false
jettyMaxThreads: -1
host: XXXXXXXXX
port: 5555
role: node
timeout: 1800
cleanUpCycle: 5000
maxSession: 1
capabilities: Capabilities {applicationName: , browserName: chrome, maxInstances: 1, platform: LINUX, platformName: LINUX, seleniumProtocol: WebDriver, server:CONFIG_UUID: ..., version: 66.0.3359.170}
downPollingLimit: 2
hub: http://XXXXXXXXX:4444/grid/register
id: http://XXXXXXXXX:5555
nodePolling: 5000
nodeStatusCheckTimeout: 5000
proxy: org.openqa.grid.selenium.proxy.DefaultRemoteProxy
register: true
registerCycle: 5000
remoteHost: http://XXXXXXXXX:5555
unregisterIfStillDownAfter: 10000

I'm calling my selenium tests via Jruby for some certain business reasons and the basic configuration looks like

co = Java::OrgOpenqaSeleniumChrome::ChromeOptions.new
co.add_arguments(["--disable-extensions"].to_java(:string))
co.add_arguments(["no-sandbox"].to_java(:string))
co.add_arguments("--headless")
chrome_prefs = {}
chrome_prefs["profile.default_content_settings.popups"] = 0.to_s
chrome_prefs["safebrowsing.enabled"] = "true"
co.set_experimental_option("prefs", chrome_prefs)
cap = Java::OrgOpenqaSeleniumRemote::DesiredCapabilities.chrome
cap.set_capability("Capability", co)

$grid_url = ENV['GRID_URL']
$driver = Java::OrgOpenqaSeleniumRemote::RemoteWebDriver.new(Java::JavaNet::URL.new($grid_url), cap) 
# Get timeout after the RemoteWebDriver.new call

Does anyone have any idea how to change the timeout here?

alexd3 · Accepted Answer

This had absolutely nothing to do with the Selenium setup, so if anyone else happens to run into this specifically when using Fargate or ECS in general and you're running the Hub behind a load balancer...

If you happened to base your CloudFormation off of the AWS Fargate examples they have on their Github, really make sure you change what they had idle_timeout.timeout_seconds for the Load Balancer set to.

Selenium Grid WebDriver returning 504 Gateway Time-out while waiting for grid to scale

Answers (1)

Related Questions