Reputation: 1705
I have many arrays of longs long[]
and I need to serialize them and save them to disk for later read, notice that each array has to be modified from time to time, but writes are infrequent while reads are frequent.
Usually my application needs only a small number of those loaded into memory at the same time.
Edits to each array can happen in batch in memory before the array are stored back to disk.
Each array has from hundreds up to a million elements.
In my application it is critical that loading the desired arrays into memory happens very fast.
In my case the long values in each array are, on average, quite near one to the other, i.e., the difference from one value to the next - if sorted within a single array- is less than an integer.
The solution adopting a trie-like structure as presented here seems not applicable for my case, since in that solution the array values are known and never change.
This solution here tells me to use ByteBuffer
and LongBuffer
to speed up I/O, but my idea would be to also store such arrays in the most compact way in order to speed-up the time needed to load them into main memory by reducing the size of what I need to read.
The intuition is to store the values sorted and store the difference between one value and the next one, which is - on average- within the Integer range and as such take less space.
But since this is not always true, I still cannot store all the values as integers, so this direction doesn't seem promising.
Am I missing something obvious?
What is the most efficient way, in I/O time, to achieve this?
EDIT In general, regarding performance as solely I/O time, without considering space on disk, this question has better answers.
Upvotes: 4
Views: 1361
Reputation: 98505
You may still encode array elements as ints with the following addition:
// The first int is the array length
buf.putInt(array.length);
long prev = 0;
for (long next : array) {
if (next - prev <= Integer.MAX_VALUE) {
// Delta is small. Change the sign and encode as int.
buf.putInt((int) (prev - next));
} else {
// Delta does not fit in 31 bits. Encode two parts of long.
buf.putInt((int) (next >>> 32));
buf.putInt((int) next);
}
prev = next;
}
Note that 31-bit delta will be encoded as negative int
. During decoding the highest (sign) bit will tell if the value is delta or a raw 63-bit long
. In the latter case you read next int
and compose a 63-bit long
from two ints:
// The first int is the array length
long[] array = new long[buf.getInt()];
long next = 0;
for (int i = 0; i < array.length; i++) {
int delta = buf.getInt();
if (delta <= 0) {
// Negative sign means the value is encoded as int delta.
next -= delta;
} else {
// Positive sign means the value is encoded as raw long.
// Read the second (lower) part of long and combine it with the higher part.
next = (long) delta << 32 | (buf.getInt() & 0xffffffffL);
}
array[i] = next;
}
This works if all values in the array are positive. If there are both positive and negative values, split them into two arrays.
By the way, a streaming compression like GZIP (or a faster alternative like LZ4) will also work well if the neighbour values are close. See GZIPOutputStream.
Upvotes: 2
Reputation: 82589
You seem to be placing a lot of emphasis on compactness and speed. To get these to a true minimum level is going to take a lot of optimization. And by a lot, I mean more than your typical developer ever has to deal with.
Instead of doing this yourself, you would be wise to research existing database solutions. The developers of these databases dedicate years to understanding the most efficient way to perform these operations, and the overhead is much lower than you might think. Not to mention the correctness and reliability you get for free.
I would use a stock database solution (just whip out a mysql, maria, or postgres instance and send it to town) and see if that meets your performance metrics. If it doesn't, find what specific metrics it doesn't meet and tune it to those. The things you're asking for require specialized knowledge of your data and the ability to experiment with it, something no one over the internet can do (or would be expected to do for free.)
Upvotes: 2