Jihun No
Jihun No

Reputation: 1221

Smallest way to save java int[] with protocol buffer 3?

I have an complex object that holds million of int

int[] ints = new int[1000000]

If I save that values directly via ByteBuffer it's file size is about 5MB

When I save that values to protocol buffer object, It save each value not as int but as Integer. Then when I save that byte[] stream to file It's file size is over than 8MB

It seems protocol buffer does not provide primitive array type.

Is there any way(or trick) to reduce the byte[] size of protocol buffer object that contains million of ints?

Upvotes: 2

Views: 273

Answers (1)

Marc Gravell
Marc Gravell

Reputation: 1064054

When I save that values to protocol buffer object

How exactly are you doing that? Normally, with protobuf, you define some type in a .proto schema; the obvious contender here would be:

syntax = "proto3";
message Whatever {
    repeated int32 ints = 1;
}

In proto3 "packed" is considered the default when enabled, so this should use "packed" encoding, giving a size that is... well, slightly dependent on the data, since it uses "varint" encoding, but for 1000000 elements it could be anywhere between 1,000004 bytes and 10,000,004 (between 1 and 10 bytes per element, 1 byte for the field header, and 3 bytes for the length - 10 bytes per element usually means: negative numbers encoded as int32).

If you know the values are often negative, or often large, you could choose to use sint32 (uses zig-zag encoding; avoids the 10-bytes for negative numbers) or sfixed32 (always uses 4 bytes per element) instead of int32, but the "packed" should still apply.

In proto2, you need to opt-in for "packed":

syntax = "proto2";
message Whatever {
    repeated int32 ints = 1 [packed=true];
}

Upvotes: 1

Related Questions