Reputation: 1216
I have to store millions of entries in a database. Each entry is identified by a set of unique integer identifiers. For example a value may be identified by a set of 10 integer identifiers, each of which are less than 100 million.
In order to reduce the size of the database, I thought of the following encoding using a single 32 bit integer value.
Identifier 1: 0 - 100,000,000 Identifier 2: 100,000,001 - 200,000,000 . . . Identifier 10: 900,000,001 - 1,000,000,000
I am using Java. I can write a simple method to encode/decode. The user code does not have to know that I am encoding/decoding during fetch/store.
What I want to know is: what is the most efficient (fastest) and recommended way to implement such encoding/decoding. A simple implementation will perform a large number of multiplications/subtractions.
Is it possible to use shifts (or bitwise operations) and choose different partition size (the size of each segment still has to be close to 100 million)?
I am open to any suggestions, ideas, or even a totally different scheme. I want to exploit the fact that the integer identifiers are bounded to drastically reduce the storage size without noticeably compromising performance.
Edit: I just wanted to add that I went through some of the answers posted on this forum. A common solution was to split the bits for each identifier. If I use 2 bits for each identifier for a total of 10 identifiers, then my range of identifiers gets severely limited.
Upvotes: 0
Views: 662
Reputation: 541
It sounds like you want to pack multiple integer values of 0...100m into a single 32bit Integer? Unless you are omitting important information that would allow to store these 0...100m values more efficiently, there is simply no way to do it.
ceil(log2(100m)) = 27bit, which means you have only 5 "spare bits".
Upvotes: 1
Reputation: 425198
It's unclear what you actually want to do, but it sounds like you want an integer value, each bit representing having a particular attribute, and applying a bitmask.
A 32-bit integer can save 32 different attributes, 64-bit 64 etc. To have more, you'll need multiple integer columns.
If that's not it, I don't know what you mean by "encode".
Upvotes: 1
Reputation: 533670
You can make the segmentation size 27 bits which gives you 32 * 128 M segements. instead of 42 * 100 M
int value =
int high = value >>> 27;
int low = value & ((1L << 27) -1);
It is worth nothing this calculation is likely to be trivial compared to the cost of using a database.
Upvotes: 1