Reputation: 971
I want to know how many bytes are exactly stored on disk when I insert a new column in a Column Family of Cassandra. My main problem is that I need to know this information when columns are compressed with Snappy, I know the calculation of raw bytes but, due to the variability of the data, I can not properly approximate the compression ratio. Any information about where to find this amount of bytes in the Cassandra codebase will welcome.
Thanks in advance.
Upvotes: 4
Views: 351
Reputation: 14106
Compression can never give guaranteed compression ratios. The best you can get is an average ratio for sample data.
So get a load of sample data, insert it into a test instance, and measure the disk usage.
You might have data that compresses very poorly with Snappy and actually results in more on-disk usage than storing raw bytes.
When it comes to compression of your data there is one and only one rule: MEASURE
Upvotes: 2