Reputation: 3191
I have a task where on a scheduled basis need to check number of files in a bucket (files are uploaded via a NAS) and then e-mail the total number using SES.
The e-mail part on its own is working fine. However, since I have over 40 000 files in the bucket it takes over 5 mins or more to return the count of total number of files.
From an design perspective, is it better to put this part of the logic in an EC2 machine and then schedule the action on the ec2? Or are there better ways to do this?
Note, I don't have to list all the files. I simply want to get a total count of all the files in the bucket.
Upvotes: 0
Views: 2219
Reputation: 270134
You did not mention how often you need to do this file count.
If it is daily or less often, you can activate Amazon S3 Inventory. It can provide a daily dump of all files in a bucket, from which you could perform a count.
Upvotes: 0
Reputation: 3259
How about having a lambda triggered every time a file is put/delete/etc
and according to the event received, lambda updates one DynamoDb table which is storing the numbers.
e.g.
In case, file is added to S3, lambda will increase the count in DynamoDb table by 1
and in case of file delete lambda will decrease the count
So this way, I guess, you will always have the latest count without even counting the files.
Upvotes: 1