Wysawyg
Wysawyg

Reputation: 722

Scala Concurrency slowing down

I'm doing to preface this with the fact I'm a relative Java/Scala newbie so I wouldn't rule out that there is something obvious I'm not doing.

I've got a Scala application which connects via Hibernate to a MySQL database. The Application is designed to process a large amount of data, about 2,750,000 records so I've tried to optimise it as much as possible.

It's running on my workstation which is a QuadCore Intel Xeon with 6Gb of RAM (at 1033Mhz) and it runs nicely and speedily for the first 70k records, completing them in about 15 minutes. By the time, it's got to 90k, it's taken about 25 minutes so something is making it slow to a crawl.

I've checked the timers on the Hibernate code and the database retrieval is taking the same about of time as usual. I've even tried forcing manual Garbage Collection to try and do that but that isn't working either.

The code in question looks something like:

val recordCount = repo.recordCount
val batchSize = 100
val batches = (0 to recordCount by batchSize).toList
val batchJobs = {
    for (batchStart <- batches) yield {
        future(new RecordFormatter().formatRecords(new Repo(sessionFactory.openSession),batchStart,batchSize)
    }
awaitAll(100000,batchJobs: *_)

Inside the RecordFormatter (which isn't actually named that in case you wonder at my naming scheme madness), it does a query for the next 100 records then another query to pull back the actual records (using between on the start and end values) then writes them out to a text file as CSV. Looking at the timer output, each operation within the record formatter takes about 5 seconds to pull back the records and then 0.1 of a second to output it to file.

Despite this once it has slowed down, it is only processing about 12 batches of 100 records per minute as opposed to 40 batches of 100 records per minute when the process first starts.

It's flushing the Session at regular intervals and closing it at the end of each RecordFormatter run (each RecordFormatter has its own session).

I'm mostly looking for any known gotchas with Scala and the Futures. I have noticed that when its slowing down, it doesn't seem to be using all eight possible threads which could certainly explain the drop in speed but it's a mystery to me why it would suddenly stop and always around the 75k record mark.

Thanks!

EDIT: Updated code to show it uses yield and awaitAll in case that makes a difference.

Upvotes: 4

Views: 523

Answers (3)

Erik Engbrecht
Erik Engbrecht

Reputation: 3164

Try bounding the maximum number of threads that the actor library will create (futures are backed by actors). The actor threads are EXTREMELY heavyweight, and under certain conditions the scheduler will create them like there's no tomorrow. This uses up a ton of heap space and can make your program spend huge amounts of time performing garbage collection.

This can be done by setting the actors.maxPoolSize parameter at the command line...which would be something like: -Dactors.maxPoolSize=32 or whatever the max number of threads you want is.

I also highly recommend running your program -Xprof to see how much time the GC is consuming.

Upvotes: 3

oxbow_lakes
oxbow_lakes

Reputation: 134270

The jconsole application, which comes bundled with the JDK (in $JAVA_HOME/bin/jconsole) can be used to attach to the application as it runs. This is very good for telling you a few things:

  1. Is the app spending all of its time doing GC?
  2. What are the application threads doing?

Could you post the results here?

Upvotes: 2

Daniel C. Sobral
Daniel C. Sobral

Reputation: 297195

It looks like a memory problem. I'd get a dump on memory usage, see how it is behaving. If the gc times are increasing too much, you have your culprit. You could, then, just increase the memory available to the JVM to get it going again.

Anyway, don't convert batches into a List. It is unnecessary. It would be necessary if you were using for/yield (on Scala 2.7), but since you are not yielding anything, then Range is a better choice.

Upvotes: 2

Related Questions