Reputation: 535
I'm trying to understand and manage Flink's usage of local storage as in my use-case I need to ensure no data is stored without encryption.
Reading through the documentation I can see that
RocksDb may use local storage in case states are growing too big to be kept in memory.
Flink uses local fs in case the data kept in memory grows too big (e in batch jobs where big aggregation activities are done) http://flink.apache.org/news/2015/05/11/Juggling-with-Bits-and-Bytes.html
Is there any other situation that need to be considered in which local storage is used? The folder in both of the cases above seems driven by the property taskmanager.tmp.dirs
Is there anyone that could point me at which class should I look at if I want to write a specific data serializer/write for the case above that includes encryption?
Upvotes: 1
Views: 533
Reputation: 3219
The two main options for custom serialization are either to create a Kryo serializer and register it or to create a full TypeInformation + TypeSerializer (Flink-native). The Kryo one is simpler as you simply have to provide the to/from bytes.
No special considerations per se, keep in mind that this is different from the checkpointing/savepoints which would also go to the File system (and use the same defined serialization). With RocksDB every write into it will go through the serialization, so the data going into it would be encrypted if you set it up that way.
As a separate note, keep in mind that there is a good chance that anybody that can read the file system can also read the configuration defining the encryption keys unless you are somehow passing that to Flink at startup remotely (in which case, I am not sure if Flink would be able to restore state without additional special code).
Upvotes: 1