Reputation: 144
I read some documentation on internet official and non official and i'm currently unable to import the logs from bigquery like "bigquery_resource" (for getting all my insert, update, merge ... processing on my gcp project ) from a gcp project where i'm owner with python on my local.
Mandatory prerequisite :
Here the code below where i try to get the logs :
import google.protobuf
from google.cloud.bigquery_logging_v1 import AuditData
import google.cloud.logging
from datetime import datetime, timedelta, timezone
import os
os.environ["GOOGLE_APPLICATION_CREDENTIALS"]="C:\\mypath\\credentials.json"
project_id = os.environ["GOOGLE_CLOUD_PROJECT"] = "project1"
yesterday = datetime.now(timezone.utc) - timedelta(days=2)
time_format = "%Y-%m-%dT%H:%M:%S.%f%z"
filter_str = (
f'logName="projects/{project_id}/logs/cloudaudit.googleapis.com%2Factivity"'
f' AND resource.type="bigquery_resource"'
f' AND timestamp>="{yesterday.strftime(time_format)}"'
)
client = google.cloud.logging.Client(project="project1")
for entry in client.list_entries(filter_=filter_str):
decoded_entry = entry.to_api_repr()
#print(decoded_entry)
print(entry) #the same output as print(decoded_entry)
open("C:\\mypath\\logs.txt", "w").close()
with open("C:\\mypath\\logs.txt", "w") as f:
for entry in client.list_entries(filter_=filter_str):
f.write(entry)
Unfortunately , it doesn't work(and my code is messy), i get a ProtobufEntry with the var entry like below and i don't know how get my data from my gcp project in a proper way.
All the help is welcome ! (please don't answer me with a deprecated answer from openaichatgpt )
Upvotes: 0
Views: 668
Reputation: 144
Here how i export my logs without creating bucket, sink, pubsub, cloud function, table in bigquery etc..
=> Only 1 Service account with rights on my project and 1 script .py on my local and added an option in the python script for scan only bigquery ressource during the last hour.
I add the path of gcloud because i have some problem with path in my envvar in my local with the popen lib, maybe you won't need to do it.
from subprocess import Popen, PIPE
import json
from google.cloud.bigquery_logging_v1 import AuditData
import google.cloud.logging
from datetime import datetime, timedelta, timezone
import os
os.environ["GOOGLE_APPLICATION_CREDENTIALS"]="C:\\Users\\USERAAAA\\Documents\\Python Scripts\\credentials.json"
gcloud_path = "C:\\Program Files (x86)\\Google\\Cloud SDK\\google-cloud-sdk\\bin\\gcloud.cmd"
process = Popen([gcloud_path, "logging", "read", "resource.type=bigquery_resource AND logName=projects/PROJECTGCP1/logs/cloudaudit.googleapis.com%2Fdata_access", "--freshness=1h"], stdout=PIPE, stderr=PIPE)
stdout, stderr = process.communicate()
output_str = stdout.decode()
# data string into a a file
with open("C:\\Users\\USERAAAA\\Documents\\Python_Scripts\\testes.txt", "w") as f:
f.write(output_str)
Upvotes: 1
Reputation: 1422
One way to achieve this as follows:
Create a dedicated logging sink for BigQuery logs:
gcloud logging sinks create my-example-sink bigquery.googleapis.com/projects/my-project-id/datasets/auditlog_dataset \
--log-filter='protoPayload.metadata."@type"="type.googleapis.com/google.cloud.audit.BigQueryAuditMetadata"'
The above command will create logging sink in a dataset named auditlog_dataset
that only includes BigQueryAuditMetadata
messages. Refer BigQueryAuditMetadata for all the events which are captured as part of GCP AuditData.
Create a service account and give access to above created dataset.
For creating service account refer here and for granting access to dataset refer here.
Use this service account to authenticate from your local environment and query the above created dataset using BigQuery Python client to get filtered BigQuery data.
from google.cloud import bigquery
client = bigquery.Client()
# Select rows from log dataset
QUERY = (
'SELECT name FROM `MYPROJECTID.MYDATASETID.cloudaudit_googleapis_com_activity`'
'LIMIT 100')
query_job = client.query(QUERY) # API request
rows = query_job.result() # Waits for query to finish
for row in rows:
print(row.name)
Also, you can query the audit tables from the console directly.
Reference BigQuery audit logging.
Another option is to use Python Script to query log events. And one more option is to use Cloud Pub/Sub to route logs to external (out of gcp) clients.
I mostly prefer to keep the filtered logs in dedicated Log Analytics bucket and query as per needs and create custom log based metrics using Cloud Monitoring. Moving logs out of GCP may incur network egress charges, refer the documentation, if you are querying large volume of data.
Upvotes: 0