Reputation: 199
Is there any special treatment has to be considered if I use EBS volume for local store?
In more clear, lets say if I run 1 job manager and 5 task manager (1 task manager per instance) in AWS EKS backed by EBS volume for local store (RockDB) and checkpoint mechanism with S3, and one of the task manager instance were rebooted for some reason (call this instance as X), now a new node joins (call this instance as Y) and local store were retrieved from S3. Now again, lets say the new node (Node Y) leaves from EKS and node X joins the Flink cluster, what will happen in this case? Since Node X has already persistent state(but old state), whether the flink will delete the old state and download the recent state from S3? How this was handled in Flink internally?
Is there any pitfall for using EBS volume for local store?
Upvotes: 0
Views: 738
Reputation: 43499
First of all, it's better to use local SSD than EBS volumes. EBS connects to the instance via the network, which puts RocksDB local state access in competition for the network with other network activities. Local SSDs perform better, and the fact that they are ephemeral doesn't matter.
If you configure local recovery, then any surviving nodes will use their local state rather than fetching the latest snapshot from S3, while any new nodes will retrieve their state from S3. If X, a former member of the cluster, rejoins after having spent some time in exile, it will delete whatever state it had, and fetch the state it needs from S3.
Upvotes: 3