Reputation: 10469
Seems that my KStream based application has been piling up many gBs of files (.sst, Log.old.<stamp>, etc).
Will these get cleaned up on their own or is this something I need to keep an eye on? Some param to be set to cull them?
Upvotes: 2
Views: 1640
Reputation: 15077
About these local/temp files: Some of these files are application state, and those should account for the majority of space consumed. Your application may be "piling up" many GBs of files simply because your application is actually managing a lot of state. These files can be reconstructed (automatically) by replaying the state's changelog from Kafka if you delete them, but this may take some time.
Will these get cleaned up on their own or is this something I need to keep an eye on? Some param to be set to cull them?
Some cleaning up is done, but as I wrote above most probably the files consume that space for a reason. Perhaps you can share a snippet of the app's processing topology as well as some info about the data the app processing, which might help to understand whether the consumed space seems about right or whether there might be an issue.
Clean up: The latest version of Kafka (0.10.0.1) now ships with an application reset tool for Kafka Streams plus some accompanying API methods that help cleaning/resetting, see Data Reprocessing with Kafka Streams: Resetting a Streams Application. That said, I am not sure whether you are intending to clean up files because you have stopped the application and want to get rid of all the local data, or because you want to do some "garbage collection" while the app is still running. If it's about the latter (GC), then in general there's no need to -- the files are there for a good reason, and most probably will just be recreated.
Upvotes: 3