Faraz
Faraz

Reputation: 6265

Would encoding a string take less memory?

I am confused about memory space topic. Lets say we have a following string:

String from = "Hello I am from Chicago";

If I encode it using Huffman Coding, LZ4, or GZip, etc. (the encoding algorithm does not matter at this point):

String encodedFrom= encodingLibrary.encode(from);

Now, the amount of memory space encodedName would hold, would be less than what name would have taken?

I am confused because, when we are storing strings, even if there are less bits, VM (or OS itself, or something) would add padding to the end to finish the byte. Something like that. So at the end of the day the memory size (not consumption) is same for both the encoded and the un-encoded String. Am I right?

2nd question, which is directly related to the first one, I in fact want to encode 100s of thousands of records and store it in the Redis cache. how would that play out, if we exclude the time it takes to compress/decompress strings and the memory consumption factors? Encoded string would take up less space in Redis cache?

Appreciate any help.

Upvotes: 3

Views: 1342

Answers (1)

Roland Illig
Roland Illig

Reputation: 41625

Compressing a string and then storing the compressed result back in another string is a very bad idea.

Strings, by convention, are sequences of characters. They are supposed to contain letters, punctuation, whitespace and similar stuff. Whoever discovers that you use them to store binary data will be mad at you since that is very unusual.

If you ever want to compress strings, be as honest as possible and store the compressed data in a byte array. Byte arrays are universal containers, and storing arbitrary data in them is to be expected.

Back to your main question. In Java, a String is basically a char[], which means that each character consumes 16 bits (as far as you are dealing with plain English or other characters from the Basic Multilingual Plane).

Since all your characters are ASCII, each of them can be encoded using 7 bits. Add another bit at the very front saying "the rest of this string is ASCII-only", you end up with a simple compression scheme and 1 + 23 * 7 bits, which amounts to 21 bytes. Sure, in this case there are 6 bits of padding in the last byte, but compared to the 2 * 23 bytes for storing the string as-is, that's already good.

(I just forgot: Since Java 9, ASCII-only strings are stored in a special way that uses only 8 bit per character instead of 16. So the compression scheme I proposed above will only be efficient until Java 8.)

When using Redis for data storage, keep in mind that it keeps all data in the RAM and crashes once it cannot allocate further RAM. (As far as I remember, that's from about 5 years ago.) Therefore if you can already estimate the Redis data to become larger than a few gigabytes, better choose another data storage library.

Upvotes: 4

Related Questions