Reputation: 796
I am running an application using docker-compose.
One of the containers is a selenium/standalone-chrome
image. I give is shm_size
of 2g.
The application works fine when there is no high load. However, I have noticed that whenever there are concurrent requests to the selenium container (9 concurrent requests on a 8-core machine) Selenium fails silently. It just dies and stays dead. Subsequent request are not handled. There is nothing in the logs. The last message is:
17:41:00.083 INFO [RemoteSession$Factory.lambda$performHandshake$0] - Started new session 5da2cd57f4e8e4f80b907564d7352051 (org.openqa.selenium.chrome.ChromeDriverService)
I am monitoring the RAM and CPU usage using both docker stats
and top
. Ram is fine .. about 50% used. Using free -m
shows shared memory at about 500m. The 8 cores are taking the load staying at around 80% most of the time. However, whenever the last request arrives - the processes just die out. CPU usage drops. Shared memory does not seem to be released though.
In order to make it work again, I have to restart the application. Otherwise, none of the subsequent requests are received or logged.
I suspect there might me some kind of limitation from the OS on the containers and once they start consuming resources the OS kills them, but to be fair, I have no idea what is going on.
Any help would be greatly appreciated.
Thanks!
Update:
Here is my docker-compose reference
selenium-chrome:
image: selenium/standalone-chrome
privileged: true
shm_size: 2g
expose:
- "4444"
This is what my logs look like when it hangs:
And after I kill the docker-compose process and restart it:
I have also tested different images. These screenshots are actually with image selenium/standalone-chrome:3.141.59-gold
.
One last thing that puzzles me even more - I am using selenium for screenshots, and I have added webhook call in the java code if the process fails. I would expect it to fire if the selenium process dies, however, it seem the java does not consider the selenium connection dead and stays waiting until I docker-compose down
. Then all the messages from the webhook are fired.
Update2: Here is what I have tried and I know so far:
1. chrome driver version makes no difference
2. shm_size increase does not make any difference
3. jvm memory limit makes no difference - command: ["java", "-Xmx2048m", "-jar", "/opt/selenium/selenium-server-standalone.jar"]
4. always hangs on the same spot .. 8 concurrent processes on a 8 core machine
5. once dead, stays dead
6. lots of chrome processes hang there - ps -aux | grep chrome
6.1 if those processes are killed - sudo kill -9 $(ps aux | grep 'chrome' | awk '{print $2}'), the process does not start again and stays dead.
7. --no-sandbox option does not help
8. the java process is alive on the host - telnet ip 4444 -> connects succesfully
Upvotes: 4
Views: 4977
Reputation: 4865
I suspect your selenium/standalone-chrome
is implemented using Java
technology.
And the container's JVM has a bounded max memory with JVM argument -Xmx2048m
or similar value.
Research selenium
JVM setup/configuration files.
What can happen is one or more of the options:
Container application crashed with out of memory, because its memory bound was reached. Solution: decrease JVM max memory bound to match container's max memory bound (maybe 2048m > 2g).
JVM application crashed with out of memory. Solution: increase JVM max memory bound to match container's max memory bound (maybe 2048m not sufficient for the task).
Container peaked its CPU utilization limit for a moment and crashed. I assume selenium
implements massive parallelism (check its configuration). Solution: provide more compute power to the container, or decrease selenium
parallelism functionality.
Note that periodic resource monitoring tools fail to identify peak resources stress. If the peak is momentary and sharp. So if the resources stress is building up gradually you can identify the breaking point.
Upvotes: 2