Raf
Raf

Reputation: 842

Throttle concurrent HTTP requests from Spark executors

I want to do some Http requests from inside a Spark job to a rate limited API. In order to keep track of the number of concurrent requests in a non-distributed system (in Scala), following works:

How are such things typically handled?

Upvotes: 7

Views: 2564

Answers (1)

shay__
shay__

Reputation: 3990

You shouldn't try to synchronise requests across Spark executors/partitions. This is totally against Spark concurrency model.

Instead, for example, divide the global rate limit R by Executors * Cores and use mapPatitions to send requests from each partition within its R/(e*c) rate limit.

Upvotes: 9

Related Questions