Reputation: 907
I have an S3 json dataset that is a dump of a KMS client-side encrypted DynamoDB (i.e each record is KMS client-side encrypted independently).
I would like to use Spark to load that dataset to perform some analysis which means I have to call KMS to decrypt each record. Having a udf that simply decrypts each line works but hits the KMS API limit of 100 calls/sec
I am wondering if there is someway to rate limit these Spark map operations?
Upvotes: 2
Views: 1938
Reputation: 1483
I think this can be handled by Spark streaming
application.
check spark.streaming.backpressure.enabled
and spark.streaming.receiver.maxRate
Enables or disables Spark Streaming's internal backpressure mechanism (since 1.5). This enables the Spark Streaming to control the receiving rate based on the current batch scheduling delays and processing times so that the system receives only as fast as the system can process. Internally, this dynamically sets the maximum receiving rate of receivers. This rate is upper bounded by the values
spark.streaming.receiver.maxRate
andspark.streaming.kafka.maxRatePerPartition
if they are set (see below).
when you want to set the maximum streaming
100 calls/sec
Maximum rate (number of records per second) at which each receiver will receive data. Effectively, each stream will consume at most this number of records per second. Setting this configuration to 0 or a negative number will put no limit on the rate. See the deployment guide in the Spark Streaming programing guide for mode details
Upvotes: 1