Reputation: 161
I found my zookeeper dataDir
is huge. I would like to understand
Thanks
Upvotes: 2
Views: 3154
Reputation: 1048
According to Zookeeper's administrator guide:
The ZooKeeper Data Directory contains files which are a persistent copy of the znodes stored by a particular serving ensemble. These are the snapshot and transactional log files. As changes are made to the znodes these changes are appended to a transaction log, occasionally, when a log grows large, a snapshot of the current state of all znodes will be written to the filesystem. This snapshot supercedes all previous logs.
So in short, for your first question, you can assume that dataDir is used to store Zookeeper's state.
As for your second question, there is no automatic cleanup. From the doc:
A ZooKeeper server will not remove old snapshots and log files, this is the responsibility of the operator. Every serving environment is different and therefore the requirements of managing these files may differ from install to install (backup for example).
The PurgeTxnLog utility implements a simple retention policy that administrators can use. The API docs contains details on calling conventions (arguments, etc...).
In the following example the last count snapshots and their corresponding logs are retained and the others are deleted. The value of should typically be greater than 3 (although not required, this provides 3 backups in the unlikely event a recent log has become corrupted). This can be run as a cron job on the ZooKeeper server machines to clean up the logs daily.
java -cp zookeeper.jar:log4j.jar:conf org.apache.zookeeper.server.PurgeTxnLog <dataDir> <snapDir> -n <count>
If this is a dev instance, I guess you could just almost completely purge the folder (except some files like myid if its there). But for a production instance you should follow the cleanup procedure shown above.
Upvotes: 2