vrtx54234
vrtx54234

Reputation: 2316

AWS CloudWatch metric for data transfer into an S3 bucket per min

My applications writes objects in an S3 bucket heavily. I am trying to find out the network data transfer in bytes / min into S3 for this bucket.

I could use BucketSizeBytes and look at the rate of its increase, but then I also have automatic deletion of expired objects, so I cannot use the BucketSizeBytes metric.

Is there any other way to get the amount of data transferred into S3 for a bucket?

Note that I find this info at the application level, think of it as a 3rd party black box app.

Upvotes: 1

Views: 1614

Answers (2)

user15117824
user15117824

Reputation:

#working code for how to push cloud watch data s3 via lambda
#https://www.youtube.com/watch?v=Jwl70bRJ6yg&t=72s


import boto3
import logging
from datetime import datetime
from datetime import timedelta
import json
import uuid

# setup simple logging for INFO
logger = logging.getLogger()
logger.setLevel(logging.INFO)

# define the connection
ec2 = boto3.resource('ec2')
cw = boto3.client('cloudwatch')
s3 = boto3.resource('s3')
s3client = boto3.client('s3')
ec2_client = boto3.client('ec2')


def lambda_handler(event, context):
    # Use the filter() method of the instances collection to retrieve
    # all running EC2 instances.

    filters = [{
        'Name': 'instance-state-name',
        'Values': ['running']
    }
    ]

    # filter the instances
    instances = ec2.instances.filter(Filters=filters)

    # locate all running instances
    RunningInstances = [instance.id for instance in instances]

    dnow = datetime.now()


    for instance in instances:
        for tags in instance.tags:
            if tags["Value"] == 'DEV':

                ec2_instance_id = instance.id
                
                metrics_list_response = cw.list_metrics(
                Dimensions=[{'Name': 'InstanceId', 'Value': ec2_instance_id}])
            
                metrics_response = get_metrics(metrics_list_response, cw)
                instanceData = json.dumps(metrics_response, default=datetime_handler)
                print(metrics_response)
                bucket_name = "myec2metricsdata"
                filename = str(uuid.uuid4())+ "__"+ec2_instance_id +'_InstanceMetrics.json'
                key = ec2_instance_id + "/" + filename
                s3client.put_object(Bucket=bucket_name, Key=key, Body=instanceData)

            



def datetime_handler(x):
    if isinstance(x, datetime):
        return x.isoformat()
    raise TypeError("Unknown type")

def get_metrics(metrics_list_response, cw):
    """
    This method will retrieve the metrics from cloudwatch.
    It will iterate through the list of metrics available and form the metrics query
    """
    metric_data_queries = []
    metrices = metrics_list_response.get('Metrics')
    for metrics in metrices:
        namespace = metrics.get("Namespace")
        dimensions = metrics.get("Dimensions")
        metric_name = metrics.get("MetricName")
        metric_id = metric_name
        if metric_name == 'DiskSpaceUtilization':
            for dimension in dimensions:
                dimension_name = dimension.get("Name")
                if dimension_name == "Filesystem":
                    """ If metric is for disk, note the file system """
                    file_system = dimension.get("Value")
                    metric_id = metric_name + file_system.replace("/", "_")
                    break

        metrics_data_query = {"Id": metric_id.lower(), "MetricStat": {
            "Metric": {"Namespace": namespace,
                       "MetricName": metric_name,
                       "Dimensions": dimensions},
            "Period": 300,
            "Stat": "Average"
        }, "Label": metric_name + "Response", "ReturnData": True}
        metric_data_queries.append(metrics_data_query)

    metrics_response = cw.get_metric_data(
        MetricDataQueries=metric_data_queries,
        # Date in format datetime(2020, 6, 16)
        StartTime=datetime.now()+timedelta(minutes=-5),
        EndTime=datetime.now()
    )

    return metrics_response

Upvotes: 0

John Rotenstein
John Rotenstein

Reputation: 269410

There are no metrics available for rate of data in/out of Amazon S3.

You could create an AWS Lambda function, triggered on the creation of new objects in the bucket.

The function could record details of new objects created, such as their size and the time. This information could be stored in a database or even in a custom metric in Amazon CloudWatch. You could then use that data to calculate a "new GB per hour" type of metric for files being created in the S3 bucket.

If storing in CloudWatch, use a SUM() aggregate for adding the sizes over the given period of time.

Upvotes: 1

Related Questions