Kramer Li
Kramer Li

Reputation: 2486

Can we change the unit of spark stream batch interval?

When we initial a spark stream context, we will use code like :

ssc = StreamingContext(sc, 1)

The 1 here is batch interval means 1 second here. The unit of batch interval here is time (second). But can we change the interval to something else? For example, the number of files.

Like we have a folder, there will be files comes in but we do not know when. What we want is that as soon as there is a file, we process it , so here the interval is not a specific time range, I hope it is the number of files.

Can we do that?

Upvotes: 0

Views: 766

Answers (1)

Marius Soutier
Marius Soutier

Reputation: 11284

That's not possible. Spark Streaming essentially executes batch jobs repeatedly in a given time interval. Additionally, all window operations are time-based as well, so the notion of time cannot be ignored in Spark Streaming.

In your case you would try to optimize the job for the lowest processing time possible and then just have several batches with 0 records when there are no new files available.

Upvotes: 2

Related Questions