RobertL
RobertL

Reputation: 76

Jenkins becomes unresponsive when a lot of builds are started

We are using a combination of Gerrit and Jenkins in a master / slave configuration where every time a new change is submitted to Gerrit a bunch (7 to be exact) test and linting jobs are triggered on Jenkins.

A lot of times (especially when other jobs, e.g. deployment are running as well) Jenkins becomes unresponsive for 10-15 minutes.

All Jenkins machines are VMs that should have enough resources to handle such a load.

We have already:

Even though there should be no jobs (or at least no resource intensive jobs) running on the master node we can still see that the JVM memory is at it's limit and the master remains unresponsive.

Has anybody seen this before?

Upvotes: 0

Views: 1036

Answers (1)

RobertL
RobertL

Reputation: 76

Sorry everyone. I cannot exactly say what helped as I took multiple steps and only put a relevant load on Jenkins after I had completed everything.

So here's what I did:

  1. Updated all packages (including Jenkins, it turns out we were using an older version)

  2. Installed OpenJDK 11 (we were previously using OpenJDK 8)

  3. Configured some of the JVM options mentioned in this support article by Cloudbees: https://support.cloudbees.com/hc/en-us/articles/222446987-Prepare-Jenkins-for-Support

    On the CentOS machine Jenkins is running on, these values need to be added to the JENKINS_JAVA_OPTIONS variable in /etc/sysconfig/jenkins. I skipped any JVM setting that clearly only refers to logging, so what I ended up adding was:

    -Xms4065m
    -Xmx4096m # This was already set, but the minimum Heap size was not defined. Jenkins best practices says both values should be the same
    -XX:+AlwaysPreTouch
    -XX:+UseG1GC
    -XX:+UseStringDeduplication
    -XX:+ParallelRefProcEnabled
    -XX:+DisableExplicitGC
    
  4. Configured ulimit as described in the article mentioned above. I therefore added the following to /etc/security/limits.conf:

    # Values are tab separated
    jenkins soft  core  unlimited
    jenkins hard  core  unlimited
    jenkins soft  fsize unlimited
    jenkins hard  fsize unlimited
    jenkins soft  nofile  4096
    jenkins hard  nofile  8192
    jenkins soft  nproc 30654
    jenkins hard  nproc 30654
    

After all that I did a reboot, started multiple deployment jobs and commit a bunch of patch-sets to Gerrit which in return triggered a whole bunch test jobs. Jenkins stayed responsive and my requests did not run into any timeouts, so I consider this fixed.

Hope this helps anybody that comes along.

Upvotes: 1

Related Questions