Random Test Failures in Jenkins

Question

I work on Python - Django websites. I use GIT as my VCS. For continuous Integration, I use Jenkins CI. I have set up two virtual environments using Python, one for development and other for pre-production.

My issue: I have many unit tests, regression and smoke tests written for the website. both my development and pre-production virtualenvs are connected to the Jenkins CI.

Recently, tests are failing randomly for both the environments in Jenkins CI whenever the changes in code are pushed to them. Sometimes, tests are failing randomly without any code changes been pushed forward.

Troubleshooting done:

Ran the tests locally, they are passing.
Did some builds manually in Jenkins CI (using the Build Now button) the tests are passing.
Ran the failing tests individually, still they are passing.

The tests that failed in the earlier builds passed in the next builds. And some tests that passed in the earlier builds, failed in the next builds. Can someone suggest what I can do?

Lee Meador · Accepted Answer

You are going to have to identify an environmental factor that causes the tests to fail randomly.

Some things I have seen cause this:

Memory - there are other things running on the CI machine and it doesn't have enough memory to do all of them and build your stuff
Time - There is something in your code that fails depending on the time. For example, I had code that would fail on Feb 29th. It surprised us after succeeding may times. It could be something like a failure to format the number of seconds if there was only one digit.
External dependencies - Your tests depend on some other server to be up. If it goes down or gets really busy, it won't respond to your test code and the test fails. This could be a database server.
Database content - You might not have set all the preconditions correctly for the test that runs against the database
Concurrency - Sometimes multi-threaded code will only fail when conditions are just right (or just wrong). A little random delay introduced by outside factors could make the code work or make it fail. Its easy to overlook race conditions in multi-threaded code.
Servers (or CPUs) - Sometimes a test will fail if it runs on a particular server or core in among the test machines. Of course if you only have one test machine, this can't happen. But if one machine has something broken, poor connectivity (firewall rules), other processes running, less (or more) memory, your tests could fail when they are randomly assigned to run on that one.
[Insert yours here] - And there are a million more.

These are hard problems to solve. Especially if they go away for no good reason. It makes you nervous because you suspect it will come back just when you are in a big hurry to fix a nasty bug in the production system.

Random Test Failures in Jenkins

Answers (1)

Related Questions