NotSoShabby
NotSoShabby

Reputation: 3678

export big query table locally

I have a big query table that I would like to run on using pandas DataFrame. The table is big and using the: pd.read_gpq() function gets stuck and does not manage to retrieve the data.

I implemented a chunk mechanism using pandas that works, but it takes a long time to fetch (an hour for 9M rows). So im looking into a new sulotion.

I would like to download the table to as a csv file and then read it. I saw this code in the google cloud docs:

# from google.cloud import bigquery
# client = bigquery.Client()
# bucket_name = 'my-bucket'
project = 'bigquery-public-data'
dataset_id = 'samples'
table_id = 'shakespeare'

destination_uri = 'gs://{}/{}'.format(bucket_name, 'shakespeare.csv')
dataset_ref = client.dataset(dataset_id, project=project)
table_ref = dataset_ref.table(table_id)

extract_job = client.extract_table(
    table_ref,
    destination_uri,
    # Location must match that of the source table.
    location='US')  # API request
extract_job.result()  # Waits for job to complete.

print('Exported {}:{}.{} to {}'.format(
    project, dataset_id, table_id, destination_uri))

but all the URIs shown in the examples are google cloud buckets URIs and not local, and I didn't manage to download it (tried to put a local URI which gave me an error).

Is there a way to download the table's data as csv file without using a bucket?

Upvotes: 2

Views: 1555

Answers (1)

DeadPool
DeadPool

Reputation: 69

As mentioned here

The limitation with bigquery export is - You cannot export data to a local file or to Google Drive, but you can save query results to a local file. The only supported export location is Cloud Storage.

Is there a way to download the table's data as csv file without using a bucket?

So now as we know that we can store query result to local file so you can use something like this :

from google.cloud import bigquery

client = bigquery.Client()

# Perform a query.
QUERY = (
    'SELECT * FROM `project_name.dataset_name.table_name`')
query_job = client.query(QUERY)  # API request
rows = query_job.result()  # Waits for query to finish

for row in rows:
    print(row.name)

this rows variable will have all the table rows and you can either directly use it or can write it to a local file.

Upvotes: 1

Related Questions