Reputation: 76
We are using a combination of Gerrit and Jenkins in a master / slave configuration where every time a new change is submitted to Gerrit a bunch (7 to be exact) test and linting jobs are triggered on Jenkins.
A lot of times (especially when other jobs, e.g. deployment are running as well) Jenkins becomes unresponsive for 10-15 minutes.
All Jenkins machines are VMs that should have enough resources to handle such a load.
We have already:
Even though there should be no jobs (or at least no resource intensive jobs) running on the master node we can still see that the JVM memory is at it's limit and the master remains unresponsive.
Has anybody seen this before?
Upvotes: 0
Views: 1036
Reputation: 76
Sorry everyone. I cannot exactly say what helped as I took multiple steps and only put a relevant load on Jenkins after I had completed everything.
So here's what I did:
Updated all packages (including Jenkins, it turns out we were using an older version)
Installed OpenJDK 11 (we were previously using OpenJDK 8)
Configured some of the JVM options mentioned in this support article by Cloudbees: https://support.cloudbees.com/hc/en-us/articles/222446987-Prepare-Jenkins-for-Support
On the CentOS machine Jenkins is running on, these values need to be added to the JENKINS_JAVA_OPTIONS
variable in /etc/sysconfig/jenkins
.
I skipped any JVM setting that clearly only refers to logging, so what I ended up adding was:
-Xms4065m
-Xmx4096m # This was already set, but the minimum Heap size was not defined. Jenkins best practices says both values should be the same
-XX:+AlwaysPreTouch
-XX:+UseG1GC
-XX:+UseStringDeduplication
-XX:+ParallelRefProcEnabled
-XX:+DisableExplicitGC
Configured ulimit as described in the article mentioned above. I therefore added the following to /etc/security/limits.conf
:
# Values are tab separated
jenkins soft core unlimited
jenkins hard core unlimited
jenkins soft fsize unlimited
jenkins hard fsize unlimited
jenkins soft nofile 4096
jenkins hard nofile 8192
jenkins soft nproc 30654
jenkins hard nproc 30654
After all that I did a reboot, started multiple deployment jobs and commit a bunch of patch-sets to Gerrit which in return triggered a whole bunch test jobs. Jenkins stayed responsive and my requests did not run into any timeouts, so I consider this fixed.
Hope this helps anybody that comes along.
Upvotes: 1