Aditya Chandrasekhar
Aditya Chandrasekhar

Reputation: 13

Downloading Large data from bigquery dataset and pandas

I'm trying to download data from the big query public dataset and store it locally in a CSV file. When I add LIMIT 10 at the end of the query, my code works but if not, I get an error that says:

Response too large to return. Consider setting allowLargeResults to true in your job configuration. 

Thank you in Advance!

Here is my code:

import pandas as pd
import pandas_gbq as gbq
import tqdm

def get_data(query,project_id):
    data = gbq.read_gbq(query, project_id=project_id,configuration={"allow_large_results":True})
    data.to_csv('blockchain.csv',header=True,index=False)

if __name__ == "__main__":
    query = """SELECT * FROM `bigquery-public-data.crypto_bitcoin.transactions` WHERE block_timestamp>='2017-09-1' and block_timestamp<'2017-10-1';"""
    project_id = "bitcoin-274091"
    get_data(query,project_id)   

Upvotes: 0

Views: 2928

Answers (1)

Nick_Kh
Nick_Kh

Reputation: 5243

As was mentioned by @Graham Polley, at first you may consider to save results of your source query to some Bigquery table and then extract data from this table to GCS. Due to the current pandas_gbq library limitations, to achieve your goal I would recommend using google-cloud-bigquery package as the officially advised Python library managing interaction with Bigquery API.

In the following example, I've used bigquery.Client.query() method to trigger a query job with job_config configuration and then invoke bigquery.Client.extract_table() method to fetch the data and store it in GCS bucket:

from google.cloud import bigquery
client = bigquery.Client()

job_config = bigquery.QueryJobConfig(destination="project_id.dataset.table")

sql = """SELECT  * FROM ..."""

query_job = client.query(sql, job_config=job_config) 
query_job.result()  

gs_path = "gs://bucket/test.csv"
ds = client.dataset(dataset, project=project_id)
tb = ds.table(table)

extract_job = client.extract_table(tb,gs_path,location='US')
extract_job.result()

As the end you can delete the table consisting staging data.

Upvotes: 2

Related Questions