Amanda
Amanda

Reputation: 971

Cassandra compression codebase

I want to know how many bytes are exactly stored on disk when I insert a new column in a Column Family of Cassandra. My main problem is that I need to know this information when columns are compressed with Snappy, I know the calculation of raw bytes but, due to the variability of the data, I can not properly approximate the compression ratio. Any information about where to find this amount of bytes in the Cassandra codebase will welcome.

Thanks in advance.

Upvotes: 4

Views: 351

Answers (1)

Stephen Connolly
Stephen Connolly

Reputation: 14106

Compression can never give guaranteed compression ratios. The best you can get is an average ratio for sample data.

So get a load of sample data, insert it into a test instance, and measure the disk usage.

You might have data that compresses very poorly with Snappy and actually results in more on-disk usage than storing raw bytes.

When it comes to compression of your data there is one and only one rule: MEASURE

Upvotes: 2

Related Questions