Reputation: 96
We want to charge users based on the amount of traffic their data has. Actually the amount of downstream bandwidth their data is consuming.
I have exported google cloud storage access_logs. From the logs, I can count the number of times a file is accessed. (filesize * count will be the bandwidth usage)
But the problem is that this doesn't work well with cached content. My calculated value is much more than the actual usage.
I went with this method because our traffic will be new and won't use the cache, which means that the difference won't matter. But in reality, it seems like it is a real problem.
This is a common use case and I think there should be a better way to solve this problem with google cloud storage.
{
"insertId": "-tohip8e1vmvw",
"logName": "projects/bucket/logs/cloudaudit.googleapis.com%2Fdata_access",
"protoPayload": {
"@type": "type.googleapis.com/google.cloud.audit.AuditLog",
"authenticationInfo": {
"principalEmail": "[email protected]"
},
"authorizationInfo": [
{
"granted": true,
"permission": "storage.objects.get",
"resource": "projects/_/bucket/bucket.appspot.com/objects/users/2y7aPImLYeTsCt6X0dwNMlW9K5h1/somefile",
"resourceAttributes": {}
},
{
"granted": true,
"permission": "storage.objects.getIamPolicy",
"resource": "projects/_/bucket/bucket.appspot.com/objects/users/2y7aPImLYeTsCt6X0dwNMlW9K5h1/somefile",
"resourceAttributes": {}
}
],
"methodName": "storage.objects.get",
"requestMetadata": {
"destinationAttributes": {},
"requestAttributes": {
"auth": {},
"time": "2019-07-02T11:58:36.068Z"
}
},
"resourceLocation": {
"currentLocations": [
"eu"
]
},
"resourceName": "projects/_/bucket/bucket.appspot.com/objects/users/2y7aPImLYeTsCt6X0dwNMlW9K5h1/somefile",
"serviceName": "storage.googleapis.com",
"status": {}
},
"receiveTimestamp": "2019-07-02T11:58:36.412798307Z",
"resource": {
"labels": {
"bucket_name": "bucket.appspot.com",
"location": "eu",
"project_id": "project-id"
},
"type": "gcs_bucket"
},
"severity": "INFO",
"timestamp": "2019-07-02T11:58:36.062Z"
}
An entry of the log.
We are using a single bucket for now. Can also use multiple if it helps.
Upvotes: 2
Views: 643
Reputation: 96
One possibility is to have a separate bucket for each user and get the bucket's bandwidth usage through timeseries api.
The endpoint for this purpose is:
https://cloud.google.com/monitoring/api/ref_v3/rest/v3/projects.timeSeries/list
And following are the parameters to achieve bytes sent for one hour (we can specify time range above 60s) whose sum will be the total bytes sent from the bucket.
{
"dataSets": [
{
"timeSeriesFilter": {
"filter": "metric.type=\"storage.googleapis.com/network/sent_bytes_count\" resource.type=\"gcs_bucket\" resource.label.\"project_id\"=\"<<<< project id here >>>>\" resource.label.\"bucket_name\"=\"<<<< bucket name here >>>>\"",
"perSeriesAligner": "ALIGN_SUM",
"crossSeriesReducer": "REDUCE_SUM",
"secondaryCrossSeriesReducer": "REDUCE_SUM",
"minAlignmentPeriod": "3600s",
"groupByFields": [
"resource.label.\"bucket_name\""
],
"unitOverride": "By"
},
"targetAxis": "Y1",
"plotType": "LINE",
"legendTemplate": "${resource.labels.bucket_name}"
}
],
"options": {
"mode": "COLOR"
},
"constantLines": [],
"timeshiftDuration": "0s",
"y1Axis": {
"label": "y1Axis",
"scale": "LINEAR"
}
}
Upvotes: 2