Reputation: 203
I run a service where the users can publicly upload and download files to our site, using Amazon S3. Last month we had a problem where a user uploaded a file that was downloaded like crazy, resulting in 170 TB of bandwidth and a huge bill.
Talking to Amazon and searching on StackOverflow the way to ensure this doesn't happen again is to download the S3 logs parse them, and take actions from there.
We could build such script, but I guess there must be some open source or third party service providing a script or service for this?
Upvotes: 1
Views: 2044
Reputation: 6813
What about:
Create a CloudFront Distribution for downloads
Setup a CloudWatch alarm that is triggered when the distribution's BytesDownloaded metric exceeds your chosen monthly limit
Add a notification (sent to an SNS topic you create) that is triggered when the alarm is fired
Add a Lambda function that is triggered by SNS when a notification is sent to that topic (the SNS topic should also have your email subscribed of course so you receive an email with the alarm)
In the Lambda function write code that uses the AWS SDK to update the cloudfront distribution and sets the enabled value to false
(You could also create a notification that is fired when the state of the alarm changes back to OK and trigger a lambda function that re-enables the distribution)
Upvotes: 2
Reputation: 1105
We had similar kind of requirement long ago.
We used CloudTrail logs to figure out all the activities being performed on our AWS Account.
hope the script for downloading and filter Cloudtrail logs helps you out. ( The following script is only for figuring out launched instance-ids, owner, eventname. please modify according to your need)
import boto3
import gzip
import os
import json
client = boto3.client('s3')
bucketname = "mybucketname"
list_bucket_objects = client.list_objects(Bucket=bucketname )
download_path = '/home/ec2-user/cloudtrail/'
# DOWNLOADING: Downloading Log files from S3
for object in list_bucket_objects['Contents']:
print object['Key']
object_name = object['Key'].split('/')
if len(object_name)==8:
print "Downloading --->%s" % object_name[7]
client.download_file(bucketname, object['Key'], download_path+object_name[7])
# UNZIPPING: Unzipping the files in one folder
file_path = '/home/ec2-user/cloudtrail/'
new_file_path = '/home/ec2-user/cloudtrail/logs/'
#Create Log Directory
if not os.path.exists(new_file_path):
os.mkdir(new_file_path)
files = os.listdir(file_path)
for file in files:
boolean = os.path.isfile(file_path+file)
if boolean == True:
f = gzip.GzipFile(file_path+file, 'rb')
s = f.read()
f.close()
split_file = file.split('.')
log_path = new_file_path+split_file[0]
print log_path
out = open(log_path, 'wb')
out.write(s)
out.close()
# PARSING AND FILTERING: parsing output into json format, filtering output and writing it in result.txt file
fin = open(log_path).read()
content = json.loads(fin)
for i in range(0, len(content['Records'])):
event = content['Records'][i]['eventName']
if 'userName' in content['Records'][i]['userIdentity']:
user = content['Records'][i]['userIdentity']['userName']
if 'responseElements' in content['Records'][i]:
res_ele = content['Records'][i]['responseElements']
if res_ele:
if 'instancesSet' in content['Records'][i]['responseElements']:
if 'items' in content['Records'][i]['responseElements']['instancesSet']:
instance_id = content['Records'][i]['responseElements']['instancesSet']['items'][0]['instanceId']
if (event == "RunInstances" and instance_id != ""):
open('result.txt', 'ab').write(event+": :"+user+": :"+instance_id+"\n")
#result.txt is stored in your current working directory.
Upvotes: 0
Reputation: 46859
My solution to this, and problem like this, is to have billing alerts on my account. I know roughly how much I should spend each month, and setup alerts accordingly - roughly I have divided that amount by 4 (weeks), and set a series of billing alerts at 1/4, 1/2, 3/4 and 1X my estimated spend.
This is not a technical solution to stop the downloads, but at least someone will get notified and they can take action before it gets out of control.
Upvotes: 1
Reputation: 14523
Your best approach is distribute your S3 content using AWS Cloudfront and implement AWS Web Application Firewall (WAF) and implement IP blocking.
So if a IP hits your Cloud Front Distribution more than for say 5 times the AWS WAF will block that IP.
Here is the detailed guide.
Upvotes: 0