Saurav Agarwal
Saurav Agarwal

Reputation: 69

Exporting BigQuery Table Data to Google Cloud Storage having where clause using python

I want to export table data from BigQuery to Google Cloud Storage. Problem is, I need data from date1 to date2 and not whole table data.

extract_job = client.extract_table(
    table_ref,
    destination_uri,
    # Location must match that of the source table.
    location='US')  # API request
extract_job.result()  

This is what I have found on the google cloud help. There is no space for adding query or limiting data using where clause.

Upvotes: 6

Views: 7034

Answers (4)

Kaustubh Joshi
Kaustubh Joshi

Reputation: 71

Use 'EXPORT DATA OPTIONS' command in native BigQuery SQL to export data from a SQL query.

Use a python client to submit SQL to BigQuery which will take care of the rest.

from google.cloud import bigquery
from google.cloud import storage

BQ = bigquery.Client()
CS = storage.Client()

def gcp_export_http(request):

    sql = """
    EXPORT DATA OPTIONS(uri="gs://gcs-bucket/*",format='PARQUET',
    compression='SNAPPY') AS SELECT * FROM 
    table_name where column_name > colunn_value
    """

    query_job = BQ.query(sql)  
    res = query_job.result() 
return res

Upvotes: 2

muntasir kabir
muntasir kabir

Reputation: 258

Solution: Exporting BigQuery Data to Google Cloud Storage having where clause using python

from google.cloud import bigquery
from google.cloud import storage

def export_to_gcs():
    QUERY = "SELECT * FROM TABLE where CONDITION" # change the table and where condition
    bq_client = bigquery.Client()
    query_job = bq_client.query(QUERY) # BigQuery API request
    rows_df = query_job.result().to_dataframe()
    
    storage_client = storage.Client() # Storage API request
    bucket = storage_client.get_bucket(BUCKETNAME) # change the bucket name
    blob = bucket.blob('temp/Add_to_Cart.csv')
    blob.upload_from_string(rows_df.to_csv(sep=';',index=False,encoding='utf-8'),content_type='application/octet-stream')
    return "success"

Upvotes: 1

Alexey Maloletkin
Alexey Maloletkin

Reputation: 1099

Unfortunately it will be two step process. First you need to build result table and after export result. From cost perspective impact should be minimal - you will pay for storage used by temp table with result but cost is $0.02 per GB per month - so if you manage to finish you task in 1 hour - cost will be $0.000027 per GB

job_config = bigquery.QueryJobConfig()
gcs_filename = 'file_*.gzip'

table_ref = client.dataset(dataset_id).table('my_temp_table')
job_config.destination = table_ref

job_config.write_disposition = bigquery.WriteDisposition.WRITE_TRUNCATE

# Start the query, passing in the extra configuration.
query_job = client.query(
    """#standardSql
    select * from `project.dataset.table` where <your_condition> ;""",
    location='US',
    job_config=job_config)

while not query_job.done():
    time.sleep(1)

#check if table successfully written
print("query completed")
job_config = bigquery.ExtractJobConfig()
job_config.compression = bigquery.Compression.GZIP
job_config.destination_format = (
    bigquery.DestinationFormat.CSV)
job_config.print_header = False

destination_uri = 'gs://{}/{}'.format(bucket_name, gcs_filename)

extract_job = client.extract_table(
    table_ref,
    destination_uri,
    job_config=job_config,
    location='US')  # API request
extract_job.result()
print("extract completed")

Upvotes: 5

LundinCast
LundinCast

Reputation: 9810

Using the code you provided (following this doc), you can only export the whole table to GCS, not the result of a query.

Alternatively, you can download and save your query result to a local file and upload it to GCS. Or even easier, save the query result to a new BigQuery table and export that new table entirely to GCS with the code you used.

Upvotes: 2

Related Questions