tpoker
tpoker

Reputation: 502

Cassandra: Difference b/w TEXT(VARCHAR) and ASCII

I understand that text and varchar are aliases, which store UTF-8 strings. What about ASCII, which in the documentation says "US-ASCII character string"? What's the difference besides encoding?

Is there any size difference? Is the a preferred choice between these two when I'm storing large strings (~500KB)?

Upvotes: 10

Views: 4798

Answers (1)

MD Ruhul Amin
MD Ruhul Amin

Reputation: 4502

Regarding this anwer:

If the data is a piece of text, for example a String in Java, which is encoded in UTF-16 in the runtime, but when serialized in Cassandra with text type then UTF-8 is used. UTF-16 always use 2 bytes per character and sometime 4 bytes, but UTF-8 is space efficient and depending on the character can be 1, 2, 3 or 4 bytes long.

That mean that there's CPU work to serialize such data for encoding/decoding purpose. Also depending on the text for example 158786464563, data will be stored with 12 bytes. That means more space is used and more IO as well.

Note cassandra offers the ascii type that follows the US-ASCII character set and is always using 1 byte per character.


Is there any size difference?

Yes

Is the a preferred choice between these two when I'm storing large strings (~500KB)?

Yes

Because ascii is more space efficient than UTF-8 and UTF-8 is more space efficient than UTF-16. Again all of the things depends how you are serializing/encoding/decoding those data. For more check-out this "what-is-the-advantage-of-choosing-ascii-encoding-over-utf-8"

Upvotes: 13

Related Questions