Mario Ortegón
Mario Ortegón

Reputation: 18900

Java Object Serialization Performance tips

I must serialize a huge tree of objects (7,000) into disk. Originally we kept this tree in a database with Kodo, but it would make thousands upon thousands of Queries to load this tree into memory, and it would take a good part of the local universe available time.

I tried serialization for this and indeed I get a performance improvement. However, I get the feeling that I could improve this by writing my own, custom serialization code. I need to make loading this serialized object as fast as possible.

In my machine, serializing / deserializing these objects takes about 15 seconds. When loading them from the database, it takes around 40 seconds.

Any tips on what could I do to improve this performance, taking into consideration that because objects are in a tree, they reference each other?

Upvotes: 6

Views: 16874

Answers (9)

Pascal de Kloe
Pascal de Kloe

Reputation: 532

You can use Colfer to generate the beans and Java's standard serialization performance will get a 10 - 1000x boost. Unless the size reaches over a GB chances are you'll be well below a second.

Upvotes: 0

cherouvim
cherouvim

Reputation: 31903

Also, have a look at XStream, a library to serialize objects to XML and back again.

Upvotes: 0

Rich
Rich

Reputation: 15757

To avoid having to write your own serialization code, give Google Protocol Buffers a try. According to their site:

Protocol buffers are Google's language-neutral, platform-neutral, extensible mechanism for serializing structured data – think XML, but smaller, faster, and simpler. You define how you want your data to be structured once, then you can use special generated source code to easily write and read your structured data to and from a variety of data streams and using a variety of languages – Java, C++, or Python

I've not used it, but have heard a lot of positive things about it. Plus, I have to maintain some custom serialization code, and it can be an absolute nightmare to do (let alone tracking down bugs), so getting someone else to do it for you is always a Good Thing.

Upvotes: 4

Andrey Vityuk
Andrey Vityuk

Reputation: 1019

I would recomend you to implement custom writeObject() and readObject() methods. In this way you will able eleminate writting chidren nodes for each node in a tree. When you use default serialization, each node will be serialized with all it's children.

For example, writeObject() of a Tree class should iterate through the all nodes of a tree and only write nodes data (without Nodes itself) with some markers, which identifies tree level.

You can look at LinkedList, to see how this methods implemented there. It uses the same approach in order to prevent writting prev and next entries for each single entry.

Upvotes: 4

dogbane
dogbane

Reputation: 274532

Don't forget to use the 'transient' key word for instance variables that don't have to be serialized. This gives you a performance boost because you are no longer reading/writing unnecessary data.

Upvotes: 11

Tom Hawtin - tackline
Tom Hawtin - tackline

Reputation: 147154

For performance, I'd suggest not using java.io serialisation at all. Instead get down on to the bytes yourself.

If you are going to java.io serialise the tree you might need to make sure your recursion doesn't get too deep, either by flattening (as say TreeSet does) or arranging to serialise the deepest nodes first (so you have back references rather than nested readObject calls).

I would be surprised if there wasn't a way in Kodo to read the entire tree in in one (or a few) goes.

Upvotes: 0

Esko Luontola
Esko Luontola

Reputation: 73625

One optimization is customizing the class descriptors, so that you store the class descriptors in a different database and in the object stream you only refer to them by ID. This reduces the space needed by the serialized data. See for example how in one project the classes SerialUtil and ClassesTable do it.

Making classes Externalizable instead of Serializable can give some performance benefits. The downside is that it requires lots of manual work.

Then there are other serialization libraries, for example jserial, which can give better performance than Java's default serialization. Also, if the object graph does not include cycles, then it can be serialized a little bit faster, because the serializer does not need to keep track of objects it has seen (see "How does it work?" in jserial's FAQ).

Upvotes: 6

thr
thr

Reputation: 19476

This is how I would do it, form the top of my head

Serialization

  1. Serialize each object individually
  2. Assign each object a unique key
  3. When an object holds a reference to another object, put the unique key for that object in the objects place in the serialization. (I would use an UUID converted to binary)
  4. Save each object into a file/database/storage using the unique key

Unserialization

  1. Start form an arbitrary object (usually the root i suspect) unserialize it and put it in a map with it's unique key as index and return it
  2. When you step on an object key in the serialization stream, first check if it's already unserialized by looking up it's unique key in the map and if it is just grab it from there, if not put a lazy loading proxy (which repeats these two steps for that object) instead of the real object which has hooks to load the right object when you need it.

Edit, you might need to use two-pass serialization and unserialization if you have circular references in there, it complicates things a bit - but not that much.

Upvotes: 1

Maurice Perry
Maurice Perry

Reputation: 32831

Have you tried compressing the stream (GZIPOutputStream) ?

Upvotes: 0

Related Questions