Santosh Tiwari
Santosh Tiwari

Reputation: 1216

Packing many bounded integers into a large single integer

I have to store millions of entries in a database. Each entry is identified by a set of unique integer identifiers. For example a value may be identified by a set of 10 integer identifiers, each of which are less than 100 million.

In order to reduce the size of the database, I thought of the following encoding using a single 32 bit integer value.

Identifier 1: 0 - 100,000,000
Identifier 2: 100,000,001 - 200,000,000
.
.
.
Identifier 10: 900,000,001 - 1,000,000,000

I am using Java. I can write a simple method to encode/decode. The user code does not have to know that I am encoding/decoding during fetch/store.

What I want to know is: what is the most efficient (fastest) and recommended way to implement such encoding/decoding. A simple implementation will perform a large number of multiplications/subtractions.

Is it possible to use shifts (or bitwise operations) and choose different partition size (the size of each segment still has to be close to 100 million)?

I am open to any suggestions, ideas, or even a totally different scheme. I want to exploit the fact that the integer identifiers are bounded to drastically reduce the storage size without noticeably compromising performance.

Edit: I just wanted to add that I went through some of the answers posted on this forum. A common solution was to split the bits for each identifier. If I use 2 bits for each identifier for a total of 10 identifiers, then my range of identifiers gets severely limited.

Upvotes: 0

Views: 662

Answers (3)

Helmuth M.
Helmuth M.

Reputation: 541

It sounds like you want to pack multiple integer values of 0...100m into a single 32bit Integer? Unless you are omitting important information that would allow to store these 0...100m values more efficiently, there is simply no way to do it.

ceil(log2(100m)) = 27bit, which means you have only 5 "spare bits".

Upvotes: 1

Bohemian
Bohemian

Reputation: 425198

It's unclear what you actually want to do, but it sounds like you want an integer value, each bit representing having a particular attribute, and applying a bitmask.

A 32-bit integer can save 32 different attributes, 64-bit 64 etc. To have more, you'll need multiple integer columns.

If that's not it, I don't know what you mean by "encode".

Upvotes: 1

Peter Lawrey
Peter Lawrey

Reputation: 533670

You can make the segmentation size 27 bits which gives you 32 * 128 M segements. instead of 42 * 100 M

int value = 
int high = value >>> 27;
int low = value & ((1L << 27) -1);

It is worth nothing this calculation is likely to be trivial compared to the cost of using a database.

Upvotes: 1

Related Questions