Reputation: 568
I have a bigquery table - I would like to extraxt it into a pandas dataframe inside cloud function and then do some changes in the header file and later save it into Cloud storage. Unfortunately my function is not working, can anyone see what could be the issue. Do i need to use big query extract job or my idea is also valid?
import base64
import pandas as pd
from google.cloud import bigquery
def extract_partial_return(event, context):
client = bigquery.Client()
bucket_name = "abc_test"
project = "bq_project"
dataset_id = "bq_dataset"
table_id = "Partial_Return_Table"
sql = """
SELECT * FROM `bq_project.bq_dataset.Partial_Return_Table`
"""
# Running the query and putting the results directly into a df
df = client.query(sql).to_dataframe()
df.columns = ["ga:t_Id", "ga:product", "ga:quantity"]
destination_uri = (
"gs://abc_test/Exports/Partial_Return_Table.csv"
)
df.to_csv(destination_uri)
My requirement.txt looks like this
# Function dependencies, for example:
# package>=version
google-cloud-bigquery
pandas
pyarrow
Upvotes: 1
Views: 1305
Reputation: 568
pyarrow library is the key here
import base64
import pandas as pd
from google.cloud import bigquery
def extract_partial_return(event, context):
client = bigquery.Client()
sql = """
SELECT * FROM `bq_project.bq_dataset.Partial_Return_Table`
"""
# Running the query and putting the results directly into a df
df = client.query(sql).to_dataframe()
df.columns = ["ga:t_Id", "ga:product", "ga:quantity"]
destination_uri = ("gs://abc_test/Exports/Partial_Return_Table.csv")
df.to_csv(destination_uri)
requirement.txt
pandas
fsspec
gcsfs
google-cloud-bigquery
google-cloud-storage
pyarrow
Upvotes: 2
Reputation: 314
First of all, the idea to do this is okay, however you may want to use other product such as Dataflow or Dataproc which are designed for these purposes.
On the other hand, in order to complete the idea that you have now, you should take care of the way you are constructing the SQL command because you are not using the variables created for the project, dataset, etc. The same issue happens on the bucket. Moreover, I think that you are lacking a couple of dependencies (fsspec and gcsfs).
Manuel
Upvotes: 1