Reputation: 2316
My applications writes objects in an S3 bucket heavily. I am trying to find out the network data transfer in bytes / min into S3 for this bucket.
I could use BucketSizeBytes
and look at the rate of its increase, but then I also have automatic deletion of expired objects, so I cannot use the BucketSizeBytes
metric.
Is there any other way to get the amount of data transferred into S3 for a bucket?
Note that I find this info at the application level, think of it as a 3rd party black box app.
Upvotes: 1
Views: 1614
Reputation:
#working code for how to push cloud watch data s3 via lambda
#https://www.youtube.com/watch?v=Jwl70bRJ6yg&t=72s
import boto3
import logging
from datetime import datetime
from datetime import timedelta
import json
import uuid
# setup simple logging for INFO
logger = logging.getLogger()
logger.setLevel(logging.INFO)
# define the connection
ec2 = boto3.resource('ec2')
cw = boto3.client('cloudwatch')
s3 = boto3.resource('s3')
s3client = boto3.client('s3')
ec2_client = boto3.client('ec2')
def lambda_handler(event, context):
# Use the filter() method of the instances collection to retrieve
# all running EC2 instances.
filters = [{
'Name': 'instance-state-name',
'Values': ['running']
}
]
# filter the instances
instances = ec2.instances.filter(Filters=filters)
# locate all running instances
RunningInstances = [instance.id for instance in instances]
dnow = datetime.now()
for instance in instances:
for tags in instance.tags:
if tags["Value"] == 'DEV':
ec2_instance_id = instance.id
metrics_list_response = cw.list_metrics(
Dimensions=[{'Name': 'InstanceId', 'Value': ec2_instance_id}])
metrics_response = get_metrics(metrics_list_response, cw)
instanceData = json.dumps(metrics_response, default=datetime_handler)
print(metrics_response)
bucket_name = "myec2metricsdata"
filename = str(uuid.uuid4())+ "__"+ec2_instance_id +'_InstanceMetrics.json'
key = ec2_instance_id + "/" + filename
s3client.put_object(Bucket=bucket_name, Key=key, Body=instanceData)
def datetime_handler(x):
if isinstance(x, datetime):
return x.isoformat()
raise TypeError("Unknown type")
def get_metrics(metrics_list_response, cw):
"""
This method will retrieve the metrics from cloudwatch.
It will iterate through the list of metrics available and form the metrics query
"""
metric_data_queries = []
metrices = metrics_list_response.get('Metrics')
for metrics in metrices:
namespace = metrics.get("Namespace")
dimensions = metrics.get("Dimensions")
metric_name = metrics.get("MetricName")
metric_id = metric_name
if metric_name == 'DiskSpaceUtilization':
for dimension in dimensions:
dimension_name = dimension.get("Name")
if dimension_name == "Filesystem":
""" If metric is for disk, note the file system """
file_system = dimension.get("Value")
metric_id = metric_name + file_system.replace("/", "_")
break
metrics_data_query = {"Id": metric_id.lower(), "MetricStat": {
"Metric": {"Namespace": namespace,
"MetricName": metric_name,
"Dimensions": dimensions},
"Period": 300,
"Stat": "Average"
}, "Label": metric_name + "Response", "ReturnData": True}
metric_data_queries.append(metrics_data_query)
metrics_response = cw.get_metric_data(
MetricDataQueries=metric_data_queries,
# Date in format datetime(2020, 6, 16)
StartTime=datetime.now()+timedelta(minutes=-5),
EndTime=datetime.now()
)
return metrics_response
Upvotes: 0
Reputation: 269410
There are no metrics available for rate of data in/out of Amazon S3.
You could create an AWS Lambda function, triggered on the creation of new objects in the bucket.
The function could record details of new objects created, such as their size and the time. This information could be stored in a database or even in a custom metric in Amazon CloudWatch. You could then use that data to calculate a "new GB per hour" type of metric for files being created in the S3 bucket.
If storing in CloudWatch, use a SUM()
aggregate for adding the sizes over the given period of time.
Upvotes: 1