deann
deann

Reputation: 796

Selenium standalone in docker compose - killed by OS?

I am running an application using docker-compose. One of the containers is a selenium/standalone-chrome image. I give is shm_size of 2g.

The application works fine when there is no high load. However, I have noticed that whenever there are concurrent requests to the selenium container (9 concurrent requests on a 8-core machine) Selenium fails silently. It just dies and stays dead. Subsequent request are not handled. There is nothing in the logs. The last message is:

17:41:00.083 INFO [RemoteSession$Factory.lambda$performHandshake$0] - Started new session 5da2cd57f4e8e4f80b907564d7352051 (org.openqa.selenium.chrome.ChromeDriverService)

I am monitoring the RAM and CPU usage using both docker stats and top. Ram is fine .. about 50% used. Using free -m shows shared memory at about 500m. The 8 cores are taking the load staying at around 80% most of the time. However, whenever the last request arrives - the processes just die out. CPU usage drops. Shared memory does not seem to be released though.

In order to make it work again, I have to restart the application. Otherwise, none of the subsequent requests are received or logged.

I suspect there might me some kind of limitation from the OS on the containers and once they start consuming resources the OS kills them, but to be fair, I have no idea what is going on.

Any help would be greatly appreciated.

Thanks!

Update:

Here is my docker-compose reference

  selenium-chrome:
    image: selenium/standalone-chrome
    privileged: true
    shm_size: 2g
    expose:
      - "4444"

This is what my logs look like when it hangs: enter image description here

And after I kill the docker-compose process and restart it: enter image description here

I have also tested different images. These screenshots are actually with image selenium/standalone-chrome:3.141.59-gold.

One last thing that puzzles me even more - I am using selenium for screenshots, and I have added webhook call in the java code if the process fails. I would expect it to fire if the selenium process dies, however, it seem the java does not consider the selenium connection dead and stays waiting until I docker-compose down. Then all the messages from the webhook are fired.

Update2: Here is what I have tried and I know so far:

1. chrome driver version makes no difference
2. shm_size increase does not make any difference
3. jvm memory limit makes no difference - command: ["java", "-Xmx2048m", "-jar", "/opt/selenium/selenium-server-standalone.jar"]
4. always hangs on the same spot .. 8 concurrent processes on a 8 core machine
5. once dead, stays dead 
6. lots of chrome processes hang there - ps -aux | grep chrome 
6.1 if those processes are killed - sudo kill -9 $(ps aux | grep 'chrome' | awk '{print $2}'), the process does not start again and stays dead.
7. --no-sandbox option does not help
8. the java process is alive on the host - telnet ip 4444 -> connects succesfully 

Upvotes: 4

Views: 4977

Answers (1)

Dudi Boy
Dudi Boy

Reputation: 4865

I suspect your selenium/standalone-chrome is implemented using Java technology.

And the container's JVM has a bounded max memory with JVM argument -Xmx2048m or similar value.

Research selenium JVM setup/configuration files.

What can happen is one or more of the options:

  1. Container application crashed with out of memory, because its memory bound was reached. Solution: decrease JVM max memory bound to match container's max memory bound (maybe 2048m > 2g).

  2. JVM application crashed with out of memory. Solution: increase JVM max memory bound to match container's max memory bound (maybe 2048m not sufficient for the task).

  3. Container peaked its CPU utilization limit for a moment and crashed. I assume selenium implements massive parallelism (check its configuration). Solution: provide more compute power to the container, or decrease selenium parallelism functionality.

Note that periodic resource monitoring tools fail to identify peak resources stress. If the peak is momentary and sharp. So if the resources stress is building up gradually you can identify the breaking point.

Upvotes: 2

Related Questions