Reputation: 4815
I want to do few tests related to diskspace used by cassandra after every write operation. I am pushing huge data during my tests. I have one keyspace and a table created in cassandra. I want to know the diskspace used by cassandra for data which I pushed in a table.
I noticed that, when I insert some data, it doesn't reflect immediately into data directory inside cassandra installation directory. So I tried to stop cassandra and restart it. As a result, I could see some files/directory created inside keyspace/data folder.
But the amount of data which I pushed to cassandra should be in MBs as it's huge. But size of data directory is in KBs while size of commit log is 32,768 kB. So, I am not sure if data is force flushed from commit table to SS table or not after restarting cassandra.
As I am new to cassandra, I am struggling to get the exact diskspace used by cassandra. Do I need to follow other steps? Or cassandra internally compresses data to a huge extent?
Upvotes: 1
Views: 888
Reputation: 57808
I have one keyspace and a table created in cassandra. I want to know the disk space used by cassandra for data which I pushed in a table.
Probably the easiest way to accomplish this, would be to use du
.
$ du -h --max-depth=1 data
2.4M data/system
500K data/system_schema
0 data/system_traces
0 data/system_distributed
448K data/system_auth
36G data/dev
36G. data
If you set max-depth=2
or specify a specific keyspace directory, you can see the actual on-disk usage by table, as well.
I am not sure if data is force flushed from commit table to SS table or not after restarting cassandra.
Yes, it absolutely is. Data written immediately will be reflected in the commitlog (on disk), but will not be written to a SSTable until the memtables are flushed. A restart forces this process, because any remaining commitlogs are verified at restart.
Or cassandra internally compresses data to a huge extent?
Assuming you mostly accepted the defaults, checking your table definitions should reveal this line:
AND compression = {'chunk_length_in_kb': '64',
'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
Also relative to prior versions, the Cassandra 3 storage engine is very efficient in its use of disk space.
Upvotes: 2