sweez
sweez

Reputation: 77

Connect to BQuery from Python, best practice

Best practice suggestions for this connection, seeking a good solution that can heavy lift medium sets of data into pandas dataframe. What is a boto3 equivalent for Bquery?

Upvotes: 0

Views: 72

Answers (2)

hSin
hSin

Reputation: 490

Make a service account json secret/file by:

  1. In console hover over IAM & Admin from left navbar, click service accounts
  2. For the service account desired, click the 3 dots to the right in the actions column, and select manage keys.
  3. Add key, this will allow you to generate the service json file that should look like this. Make sure the scopes line is added if it wasn't generated with the file.

service_account.json :

{
 "type": "service_account",
 "project_id": "projectid",
 "private_key_id": "xxxx",
 "private_key": "-----BEGIN PRIVATE KEY-----",
 "client_email": "serviceaccount.iam.gserviceaccount.com",
 "client_id": "xxxxx",
 "auth_uri": "https://accounts.google.com/o/oauth2/auth",
 "token_uri": "https://oauth2.googleapis.com/token",
 "auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
 "client_x509_cert_url": "https://www.googleapis.com/robot/v1/metadata/xxxxx",
 "scopes": ["https://www.googleapis.com/auth/drive","https://www.googleapis.com/auth/cloud-platform","https://www.googleapis.com/auth/bigquery"]
}

Within a Python project:

  1. Required packages (google.cloud, google.oauth2). Reference for others possibly needed google-cloud packages
  2. Put the service_account.json in a directory. Pass the file to a credentials variable, utilize credentials for script

Sample Script

from google.oauth2 import service_account
from google.cloud import bigquery
import pandas as pd

# setup credentials for connecting to GCP
service_account_file_path = "Path to service_account.json"
gcp_credentials = service_account.Credentials.from_service_account_file(service_account_file_path)

# sample function for loading a pandas df to a bigquery table
def load_to_gbq(df):
    client = bigquery.Client(credentials=gcp_credentials)
    project_id = "project_id"
    dataset_id = "dataset_id"
    table_id = "table_id"

    table_ref = f"{project_id}.{dataset_id}.{table_id}"
    # Upload the DataFrame to the existing table
    job_config = bigquery.LoadJobConfig()
    job = client.load_table_from_dataframe(df, table_ref, job_config=job_config)
    job.result()
    return

# function for reading from a bigquery table into a df
def extract_data_gbq(start_date, end_date):
    client = bigquery.Client(credentials=gcp_credentials)
    project_id = "project_id"
    dataset_id = "dataset_id"
    table_id = "table_id"
    main_table_ref = f"{project_id}.{table_id}"
    query = f"""
            SELECT
            *
            FROM {main_table_ref}
            WHERE
            transaction_date BETWEEN '{start_date}' AND '{end_date}'
            """

    extract_job = client.query(query)
    extract_job_result = extract_job.result()
    df = extract_job_result.to_dataframe()
    return df

Upvotes: 0

marky
marky

Reputation: 153

Google Cloud BigQuery Client Libraries is the nearest we can compare to Boto3, please see this link for the guide and how to’s. There is also a free trial that you can use to explore and test what you need here.

Upvotes: 0

Related Questions