Ankit Khettry
Ankit Khettry

Reputation: 1027

What happens if a Spark-streaming application encounters a HUGE file?

Lets consider the below code:

val streamingContext = new StreamingContext(sparkConf, Seconds(frequency))
val stream = streamingContext.textFileStream("/abc/def")

What would happen if, say, a one terabyte file suddenly lands in this directory? How is it handled, or how does it fail?

On a related note, what happens if Spark fails to keep up with the speed of the incoming data?

Upvotes: 0

Views: 51

Answers (1)

Bhavesh
Bhavesh

Reputation: 919

Spark Streaming receives data streams and divides the data into batches, which are then processed by the Spark engine to generate the final stream of results in batches.

This should not affect the processing it will keep the data into queue for processing if processing is taking long time queue will get increase

Checkpointing will take care of fail over mechanism

Note: In Extreme Case if its not able to handle the Input request it will fail it depends on your Cluster handling capacity.

enter image description here

Upvotes: 1

Related Questions