icamts
icamts

Reputation: 186

Size of Java serialized Clojure data structures

I opened this issue on github project prevayler-clj

https://github.com/klauswuestefeld/prevayler-clj/issues/1

because 1M short vectors, like this [:a1 1], forming the state of the prevayler, results in 1GB file size when serialized, one by one, with Java writeObject.

Is it possible? About 1kB for each PersistentVector? Further investigations demonstrated the same amount of vectors can be serialized in a 80MB file. So, what's going wrong in prevayler serialization? Am I doing something wrong in these tests. Please refer to the github issue for my tests code excerpts.

Upvotes: 1

Views: 169

Answers (2)

Marko Topolnik
Marko Topolnik

Reputation: 200236

Prevayler apparently starts a fresh ObjectOutputStream for each serialized element, preventing any reuse of class data between them. Your test code, on the other hand, is written the "natural" way, allowing reuse. What forces Prevayler to restart every time is not clear to me, but I would hesitate to call it a "feature", given the negative impact it has; "workaround" is the more likely designation.

Upvotes: 1

Joost Diepenmaat
Joost Diepenmaat

Reputation: 17771

There's nothing wrong with prevLayer per say. It's just that java's writeObject method is not exactly tuned to writing clojure data; it's intended to store the internal structure of any serializable java object. Since clojure vectors are reasonably complex java objects under the hood, I'm not very suprised that a small vector may write out as roughly a Kb of data.

I'd guess that pretty much any clojure-specific serialization method would result in smaller files. From experience, standard clojure.core/pr + clojure.core/read gives a good balance between file size and speed and handles data structures of nearly any size.

See these pages for some insight in the internals of clojure vectors:

Upvotes: 1

Related Questions