Reputation: 77
Best practice suggestions for this connection, seeking a good solution that can heavy lift medium sets of data into pandas dataframe. What is a boto3 equivalent for Bquery?
Upvotes: 0
Views: 72
Reputation: 490
Make a service account json secret/file by:
scopes
line is added if it wasn't generated with the file.service_account.json
:
{
"type": "service_account",
"project_id": "projectid",
"private_key_id": "xxxx",
"private_key": "-----BEGIN PRIVATE KEY-----",
"client_email": "serviceaccount.iam.gserviceaccount.com",
"client_id": "xxxxx",
"auth_uri": "https://accounts.google.com/o/oauth2/auth",
"token_uri": "https://oauth2.googleapis.com/token",
"auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
"client_x509_cert_url": "https://www.googleapis.com/robot/v1/metadata/xxxxx",
"scopes": ["https://www.googleapis.com/auth/drive","https://www.googleapis.com/auth/cloud-platform","https://www.googleapis.com/auth/bigquery"]
}
Within a Python project:
google.cloud
, google.oauth2
). Reference for others possibly needed google-cloud packagesservice_account.json
in a directory. Pass the file to a credentials
variable, utilize credentials
for scriptSample Script
from google.oauth2 import service_account
from google.cloud import bigquery
import pandas as pd
# setup credentials for connecting to GCP
service_account_file_path = "Path to service_account.json"
gcp_credentials = service_account.Credentials.from_service_account_file(service_account_file_path)
# sample function for loading a pandas df to a bigquery table
def load_to_gbq(df):
client = bigquery.Client(credentials=gcp_credentials)
project_id = "project_id"
dataset_id = "dataset_id"
table_id = "table_id"
table_ref = f"{project_id}.{dataset_id}.{table_id}"
# Upload the DataFrame to the existing table
job_config = bigquery.LoadJobConfig()
job = client.load_table_from_dataframe(df, table_ref, job_config=job_config)
job.result()
return
# function for reading from a bigquery table into a df
def extract_data_gbq(start_date, end_date):
client = bigquery.Client(credentials=gcp_credentials)
project_id = "project_id"
dataset_id = "dataset_id"
table_id = "table_id"
main_table_ref = f"{project_id}.{table_id}"
query = f"""
SELECT
*
FROM {main_table_ref}
WHERE
transaction_date BETWEEN '{start_date}' AND '{end_date}'
"""
extract_job = client.query(query)
extract_job_result = extract_job.result()
df = extract_job_result.to_dataframe()
return df
Upvotes: 0