enrico
enrico

Reputation: 535

When does Flink use of local storage

I'm trying to understand and manage Flink's usage of local storage as in my use-case I need to ensure no data is stored without encryption.

Reading through the documentation I can see that

Is there any other situation that need to be considered in which local storage is used? The folder in both of the cases above seems driven by the property taskmanager.tmp.dirs

Is there anyone that could point me at which class should I look at if I want to write a specific data serializer/write for the case above that includes encryption?

Upvotes: 1

Views: 533

Answers (1)

Joshua DeWald
Joshua DeWald

Reputation: 3219

The two main options for custom serialization are either to create a Kryo serializer and register it or to create a full TypeInformation + TypeSerializer (Flink-native). The Kryo one is simpler as you simply have to provide the to/from bytes.

No special considerations per se, keep in mind that this is different from the checkpointing/savepoints which would also go to the File system (and use the same defined serialization). With RocksDB every write into it will go through the serialization, so the data going into it would be encrypted if you set it up that way.

As a separate note, keep in mind that there is a good chance that anybody that can read the file system can also read the configuration defining the encryption keys unless you are somehow passing that to Flink at startup remotely (in which case, I am not sure if Flink would be able to restore state without additional special code).

Upvotes: 1

Related Questions