What happens if a Spark-streaming application encounters a HUGE file?

Question

Lets consider the below code:

val streamingContext = new StreamingContext(sparkConf, Seconds(frequency))
val stream = streamingContext.textFileStream("/abc/def")

What would happen if, say, a one terabyte file suddenly lands in this directory? How is it handled, or how does it fail?

On a related note, what happens if Spark fails to keep up with the speed of the incoming data?

Bhavesh · Accepted Answer

Spark Streaming receives data streams and divides the data into batches, which are then processed by the Spark engine to generate the final stream of results in batches.

This should not affect the processing it will keep the data into queue for processing if processing is taking long time queue will get increase

Checkpointing will take care of fail over mechanism

Note: In Extreme Case if its not able to handle the Input request it will fail it depends on your Cluster handling capacity.

What happens if a Spark-streaming application encounters a HUGE file?

Answers (1)

Related Questions