Reputation: 1251
Some third party application is uploading around 10000 object to my bucket+prefix in a day. My requirement is to fetch all objects which were uploaded to my bucket+prefix in last 24 hours.
There are so many files in my bucket+prefix.
So I assume that when I call
response = s3_paginator.paginate(Bucket=bucket,Prefix='inside-bucket-level-1/', PaginationConfig={"PageSize": 1000})
then may be it makes multiple calls to S3 API and may be that's why it is showing Rate Exceeded error.
Below is my Python Lambda Function.
import json
import boto3
import time
from datetime import datetime, timedelta
def lambda_handler(event, context):
s3 = boto3.client("s3")
from_date = datetime.today() - timedelta(days=1)
string_from_date = from_date.strftime("%Y-%m-%d, %H:%M:%S")
print("Date :", string_from_date)
s3_paginator = s3.get_paginator('list_objects_v2')
list_of_buckets = ['kush-dragon-data']
bucket_wise_list = {}
for bucket in list_of_buckets:
response = s3_paginator.paginate(Bucket=bucket,Prefix='inside-bucket-level-1/', PaginationConfig={"PageSize": 1000})
filtered_iterator = response.search(
"Contents[?to_string(LastModified)>='\"" + string_from_date + "\"'].Key")
keylist = []
for key_data in filtered_iterator:
if "/" in key_data:
splitted_array = key_data.split("/")
if len(splitted_array) > 1:
if splitted_array[-1]:
keylist.append(splitted_array[-1])
else:
keylist.append(key_data)
bucket_wise_list.update({bucket: keylist})
print("Total Number Of Object = ", bucket_wise_list)
# TODO implement
return {
'statusCode': 200,
'body': json.dumps(bucket_wise_list)
}
So when we execute above Lambda Function then it shows below error.
"Calling the invoke API action failed with this message: Rate Exceeded."
Can anyone help to resolve this error and achieve my requirement ?
Upvotes: 0
Views: 10540
Reputation: 1267
This is most likely due to you reaching your quota limit for AWS S3 API calls. The "bigger hammer" solution is to request a quota increase, but if you don't want to do that, there is another way using botocore.Config
built in retries, for example:
import json
import time
from datetime import datetime, timedelta
from boto3 import client
from botocore.config import Config
config = Config(
retries = {
'max_attempts': 10,
'mode': 'standard'
}
)
def lambda_handler(event, context):
s3 = client('s3', config=config)
###ALL OF YOUR CURRENT PYTHON CODE EXACTLY THE WAY IT IS###
This config will use exponentially increasing sleep timer for a maximum number of retries. From the docs:
There is also an adaptive
mode which is still experimental. For more info, see the docs on botocore.Config
retries
Another (much less robust IMO) option would be to write your own paginator with a sleep programmed in, though you'd probably just want to use the builtin backoff in 99.99% of cases (even if you do have to write your own paginator). (this code is untested and isn't even asynchronous, so the sleep will be in addition to the wait time for a page response. To make the "sleep time" exactly sleep_secs
, you'll need to use concurrent.futures
or asyncio
(AWS built in paginators mostly use concurrent.futures
)):
from boto3 import client
from typing import Generator
from time import sleep
def get_pages(bucket:str,prefix:str,page_size:int,sleep_secs:float) -> Generator:
s3 = client('s3')
page:dict = client.list_objects_v2(
Bucket=bucket,
MaxKeys=page_size,
Prefix=prefix
)
next_token:str = page.get('NextContinuationToken')
yield page
while(next_token):
sleep(sleep_secs)
page = client.list_objects_v2(
Bucket=bucket,
MaxKeys=page_size,
Prefix=prefix,
ContinuationToken=next_token
)
next_token = page.get('NextContinuationToken')
yield page
Upvotes: 0
Reputation: 645
This is probably due to your account restrictions, you should add retry with some seconds between retries or increase pagesize
Upvotes: 0