Saket
Saket

Reputation: 3129

Spark streaming tab disappears after restarting from checkpoint

I have a Spark Streaming job running on a cluster (Spark 1.6) which checkpoints to S3. When I start up the job initially, I can see "Streaming" tab. However when I restart the job from checkpoint the Streaming tab disappears. The job still works as a streaming job and I see the batches appear at the configured batch interval. See below.

Snapshot

If I clear out the checkpoint data, the tab comes back. I suspect that the Streaming tab is not registered correctly while restarting from a checkpoint.

I looked at the Spark Streaming code. Is it possible this flow is not invoked when the application state is deserialised from a checkpoint?

Does anyone know how to fix this?

Upvotes: 2

Views: 1271

Answers (1)

Yuval Itzchakov
Yuval Itzchakov

Reputation: 149538

If I clear out the checkpoint data, the tab comes back. I suspect that the Streaming tab is not registered correctly while restarting from a checkpoint.

It is invoked, but the streaming tab doesn't appear until it finishes loading all the data from the S3 checkpoint location. If your lineage is long, it may take some time to load. Once all the data is restored from checkpoint, you'll see the streaming tab appear.

Upvotes: 2

Related Questions