Willie S
Willie S

Reputation: 83

Retrieve Data from BigQuery to CSV file

I am working to download some data from BigQuery and did some coding. As I am not the owner and don't have permission on BigQuery creation of object, then I have used the standard SQL query to load the 21 million rows of records. The performance is really bad (Windows+Anaconda) and take at least 1 hour to download 1 million even using pyarrow with no progressing at all.

I may need to do some ETL and write back to a database table. I am not sure whether the performance will be improved by downloading target table(s) to a database directly.

Would anyone gives me any clues? Thanks in advance. Regards, AM

Here is my coding:

import google.auth

from google.cloud import bigquery
from google.oauth2 import service_account
from pandas.io import gbq
 
credentials = service_account.Credentials.from_service_account_file('D:/secure/myprojid-123a1285876z.json') 
project_id = 'myprojid' 
client = bigquery.Client(credentials= credentials,project=project_id) 
 
query_job = client.query(""" 
  SELECT * 
  FROM big_query_schema.bg_table_name_01 
  """) 
 
results = query_job.result()  # Waits for job to complete. 
df=results.to_dataframe()

df.to_csv('D:/temp/out.txt',index=False, sep='|')

Upvotes: 1

Views: 106

Answers (1)

Lak
Lak

Reputation: 4166

The problem is probably that you are trying to load the entire dataframe into memory before writing it out.

Two solutions:

  1. Use bq extract on the command line.
  2. Page through the results and write it line by line (use storage api list_rows)

Upvotes: 1

Related Questions