Reputation: 69
I want to export table data from BigQuery to Google Cloud Storage. Problem is, I need data from date1 to date2 and not whole table data.
extract_job = client.extract_table(
table_ref,
destination_uri,
# Location must match that of the source table.
location='US') # API request
extract_job.result()
This is what I have found on the google cloud help. There is no space for adding query or limiting data using where clause.
Upvotes: 6
Views: 7034
Reputation: 71
Use 'EXPORT DATA OPTIONS' command in native BigQuery SQL to export data from a SQL query.
Use a python client to submit SQL to BigQuery which will take care of the rest.
from google.cloud import bigquery
from google.cloud import storage
BQ = bigquery.Client()
CS = storage.Client()
def gcp_export_http(request):
sql = """
EXPORT DATA OPTIONS(uri="gs://gcs-bucket/*",format='PARQUET',
compression='SNAPPY') AS SELECT * FROM
table_name where column_name > colunn_value
"""
query_job = BQ.query(sql)
res = query_job.result()
return res
Upvotes: 2
Reputation: 258
Solution: Exporting BigQuery Data to Google Cloud Storage having where clause using python
from google.cloud import bigquery
from google.cloud import storage
def export_to_gcs():
QUERY = "SELECT * FROM TABLE where CONDITION" # change the table and where condition
bq_client = bigquery.Client()
query_job = bq_client.query(QUERY) # BigQuery API request
rows_df = query_job.result().to_dataframe()
storage_client = storage.Client() # Storage API request
bucket = storage_client.get_bucket(BUCKETNAME) # change the bucket name
blob = bucket.blob('temp/Add_to_Cart.csv')
blob.upload_from_string(rows_df.to_csv(sep=';',index=False,encoding='utf-8'),content_type='application/octet-stream')
return "success"
Upvotes: 1
Reputation: 1099
Unfortunately it will be two step process. First you need to build result table and after export result. From cost perspective impact should be minimal - you will pay for storage used by temp table with result but cost is $0.02 per GB per month - so if you manage to finish you task in 1 hour - cost will be $0.000027 per GB
job_config = bigquery.QueryJobConfig()
gcs_filename = 'file_*.gzip'
table_ref = client.dataset(dataset_id).table('my_temp_table')
job_config.destination = table_ref
job_config.write_disposition = bigquery.WriteDisposition.WRITE_TRUNCATE
# Start the query, passing in the extra configuration.
query_job = client.query(
"""#standardSql
select * from `project.dataset.table` where <your_condition> ;""",
location='US',
job_config=job_config)
while not query_job.done():
time.sleep(1)
#check if table successfully written
print("query completed")
job_config = bigquery.ExtractJobConfig()
job_config.compression = bigquery.Compression.GZIP
job_config.destination_format = (
bigquery.DestinationFormat.CSV)
job_config.print_header = False
destination_uri = 'gs://{}/{}'.format(bucket_name, gcs_filename)
extract_job = client.extract_table(
table_ref,
destination_uri,
job_config=job_config,
location='US') # API request
extract_job.result()
print("extract completed")
Upvotes: 5
Reputation: 9810
Using the code you provided (following this doc), you can only export the whole table to GCS, not the result of a query.
Alternatively, you can download and save your query result to a local file and upload it to GCS. Or even easier, save the query result to a new BigQuery table and export that new table entirely to GCS with the code you used.
Upvotes: 2