Reputation: 381
Looking to count the number of times all of the objects in my CloudFront dist has been hit individually so that I can generate an excel sheet to track usage stats. I've been looking through the boto3 docs for CloudFront and I haven't been able to peg down where that information could be accessed. I see that AWS Cloudfront console generates a 'Popular Objects' report. I wasn't sure if anyone knew how to get the numbers that AWS generates for that report in boto3?
If it's not accessible through Boto3, would there be an AWS CLI command that I should use instead?
UPDATE:
Here's what I ended up using as pseudo-code, hopefully it's a starting point for someone else:
import boto3
import gzip
from datetime import datetime, date, timedelta
import shutil
from xlwt import Workbook
def analyze(timeInterval):
"""
analyze usage data in cloudfront
:param domain:
:param id:
:param password:
:return: usage data
"""
outputList = []
outputDict = {}
s3 = boto3.resource('s3', aws_access_key_id=AWS_ACCESS_KEY_ID, aws_secret_access_key=PASSWORD)
data = s3.Bucket(AWS_STORAGE_BUCKET_NAME)
count = 0
currentDatetime = str(datetime.now()).split(' ')
currentDatetime = currentDatetime[0].split('-')
currentdatetimeYear = int(currentDatetime[0])
currentdatetimeMonth = int(currentDatetime[1])
currentdatetimeDay = int(currentDatetime[2])
currentDatetime = date(year=currentdatetimeYear, month=currentdatetimeMonth, day=currentdatetimeDay)
# create excel workbook/sheet that we'll save results to
wb = Workbook()
sheet1 = wb.add_sheet('Log Results By URL')
sheet1.write(0, 1, 'File')
sheet1.write(0, 2, 'Total Hit Count')
sheet1.write(0, 3, 'Total Byte Count')
for item in data.objects.all():
count += 1
# print(count, '\n', item)
# print(item.key)
datetimeRef = str(item.key).replace(CLOUDFRONT_IDENTIFIER+'.', '')
datetimeRef = datetimeRef.split('.')
datetimeRef = datetimeRef[0]
datetimeRef = str(datetimeRef[:-3]).split('-')
datetimeRefYear = int(datetimeRef[0])
datetimeRefMonth = int(datetimeRef[1])
datetimeRefDay = int(datetimeRef[2])
datetimeRef = date(year=datetimeRefYear, month=datetimeRefMonth, day=datetimeRefDay)
# print('comparing', datetimeRef - timedelta(days=1), currentDatetime)
if timeInterval == 'daily':
if datetimeRef > currentDatetime - timedelta(days=1):
pass
else:
# file not within datetime restrictions, don't do stuff
continue
elif timeInterval == 'weekly':
if datetimeRef > currentDatetime - timedelta(days=7):
pass
else:
# file not within datetime restrictions, don't do stuff
continue
elif timeInterval == 'monthly':
if datetimeRef > currentDatetime - timedelta(weeks=4):
pass
else:
# file not within datetime restrictions, don't do stuff
continue
elif timeInterval == 'yearly':
if datetimeRef > currentDatetime - timedelta(weeks=52):
pass
else:
# file not within datetime restrictions, don't do stuff
continue
print('datetimeRef', datetimeRef)
print('currentDatetime', currentDatetime)
print('Analyzing File:', item.key)
# download the file
s3.Bucket(AWS_STORAGE_BUCKET_NAME).download_file(item.key, 'logFile.gz')
# unzip the file
with gzip.open('logFile.gz', 'rb') as f_in:
with open('logFile.txt', 'wb') as f_out:
shutil.copyfileobj(f_in, f_out)
# read the text file and add contents to a list
with open('logFile.txt', 'r') as f:
lines = f.readlines()
localcount = -1
for line in lines:
localcount += 1
if localcount < 2:
continue
else:
outputList.append(line)
# print(outputList)
# iterate through the data collecting hit counts and byte size
for dataline in outputList:
data = dataline.split('\t')
# print(data)
if outputDict.get(data[7]) is None:
outputDict[data[7]] = {'count': 1, 'byteCount': int(data[3])}
else:
td = outputDict[data[7]]
outputDict[data[7]] = {'count': int(td['count']) + 1, 'byteCount': int(td['byteCount']) + int(data[3])}
# print(outputDict)
# iterate through the result dictionary and write to the excel sheet
outputDictKeys = outputDict.keys()
count = 1
for outputDictKey in outputDictKeys:
sheet1.write(count, 1, str(outputDictKey))
sheet1.write(count, 2, outputDict[outputDictKey]['count'])
sheet1.write(count, 3, outputDict[outputDictKey]['byteCount'])
count += 1
safeDateTime = str(datetime.now()).replace(':', '.')
# save the workbook
wb.save(str(timeInterval)+str('_Log_Result_'+str(safeDateTime)) + '.xls')
if __name__ == '__main__':
analyze('daily')
Upvotes: 1
Views: 1103
Reputation: 270039
From Configuring and Using Standard Logs (Access Logs) - Amazon CloudFront:
You can configure CloudFront to create log files that contain detailed information about every user request that CloudFront receives. These are called standard logs, also known as access logs. These standard logs are available for both web and RTMP distributions. If you enable standard logs, you can also specify the Amazon S3 bucket that you want CloudFront to save files in.
The log files can be quite large, but you can Query Amazon CloudFront Logs using Amazon Athena.
Upvotes: 1