Reputation: 13
I'm trying to download data from the big query public dataset and store it locally in a CSV
file. When I add LIMIT 10
at the end of the query, my code works but if not, I get an error that says:
Response too large to return. Consider setting allowLargeResults to true in your job configuration.
Thank you in Advance!
Here is my code:
import pandas as pd
import pandas_gbq as gbq
import tqdm
def get_data(query,project_id):
data = gbq.read_gbq(query, project_id=project_id,configuration={"allow_large_results":True})
data.to_csv('blockchain.csv',header=True,index=False)
if __name__ == "__main__":
query = """SELECT * FROM `bigquery-public-data.crypto_bitcoin.transactions` WHERE block_timestamp>='2017-09-1' and block_timestamp<'2017-10-1';"""
project_id = "bitcoin-274091"
get_data(query,project_id)
Upvotes: 0
Views: 2928
Reputation: 5243
As was mentioned by @Graham Polley, at first you may consider to save results of your source query to some Bigquery table and then extract data from this table to GCS. Due to the current pandas_gbq
library limitations, to achieve your goal I would recommend using google-cloud-bigquery
package as the officially advised Python library managing interaction with Bigquery API.
In the following example, I've used bigquery.Client.query() method to trigger a query job with job_config
configuration and then invoke bigquery.Client.extract_table() method to fetch the data and store it in GCS bucket:
from google.cloud import bigquery
client = bigquery.Client()
job_config = bigquery.QueryJobConfig(destination="project_id.dataset.table")
sql = """SELECT * FROM ..."""
query_job = client.query(sql, job_config=job_config)
query_job.result()
gs_path = "gs://bucket/test.csv"
ds = client.dataset(dataset, project=project_id)
tb = ds.table(table)
extract_job = client.extract_table(tb,gs_path,location='US')
extract_job.result()
As the end you can delete the table consisting staging data.
Upvotes: 2