Application slows down over time - Java + Python

This is a difficult one to explain, and not hopeful for a single, simple answer, but thought it's worth a shot. Interested in what might slow down a long Python job that interacts with a Java application.

We have an instance of Tomcat running a fairly complex and robust webapp called Fedora Commons (not to be confused with Fedora the OS), software for storing digital objects. Additionally, we have a python middleware that performs long background jobs with Celery. One particular job is ingesting a 400+ page book, where each page of the book has a large TIFF file, then some smaller PDF, XML, and metadata files. Over the course of 10-15 minutes, derivatives are created from these files and they are added to a single object in Fedora.

Our problem: over the course of ingesting one book, adding files to the digital object in the Java app Fedora Commons slows down very consistently and predictably, but I can't figure out how or why.

I thought a graph of the ingest speeds might help, perhaps it belies a common memory management pattern that those more experienced with Java might recognize:

The top-left graph is timing large TIFFs, being converted to JP2, then ingested into Fedora Commons. The bottom-left is very small XML files, with no derivative being made, ingested as well. As you can see, the slope of their curve slowing down is almost identical. On the right, are those two processes graphed together.

I've been all over the internet trying to learn about garbage collection in Java (GC), trying different configurations, but not having much effect on the slowdown. If it helps, here are some memory configurations we're passing to Tomcat (where the tail-end I believe are mostly diagnostic):

JAVA_OPTS='-server -Xms1g -Xmx1g -XX:+UseG1GC -XX:+DisableExplicitGC -XX:SurvivorRatio=10 -XX:TargetSurvivorRatio=90 -verbose:gc -Xloggc:/var/log/tomcat7/ggc.log -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintHeapAtGC'

We're working with 12GB of RAM on this VM.

I realize the number of factors that might result in this behavior are, excuse the pun, off the charts. But we've worked with Fedora Commons and our Python middleware for quite some time, and been mostly successful. This slow down you could set your watch too just feels suspiciously Java / garbage collection related, though I could be very wrong about that too.

Any help or advice for digging in more is appreciated!

Upvotes: 0

Answers (3)

ghukill

Reputation: 1222

Thanks to all for the suggestions around GC and Tomcat analysis. Turns out, the slowdown was entirely due to ways that Fedora Commons builds digital objects. I was able to isolate this by creating an extremely simple digital object, iteratively adding near zero-size datastreams and watching the progress. You can see this in the graph below:

The curve of the slowdown as almost identical, which suggested it was not our particular ingest method or file sizes. Furthermore, prompted me to dig back into old forum posts about Fedora Commons which confirm that single objects are not meant to contain a large number of datastreams.

It is perhaps interesting how this knowledge was obfuscated behind intellectual organization of digital objects, and not specifically the performance hits you take with Fedora, but that's probably fodder for another forum.

Thanks again to all for the suggestions - if nothing else, normal usage of Fedora is finer tuned and humming along better than before.

Upvotes: 0

puhlen

Reputation: 8529

You say you suspect GC as the problem, but you show no GC metrics. Put your program through a profiler and see why the GC is overloaded. It is hard to solve a problem without identifying the cause.

Once you have the found where the problem lies, likely you will need to change the code instead of just tweaking GC settings.

Upvotes: 0

Max Uppenkamp

Reputation: 974

Well, instead of looking into obscure GC settings, you might want to start managing memory explicitly, so the GC doesn't affect your execution that much.

Upvotes: -1

Application slows down over time - Java + Python

Answers (3)

Related Questions