Yehoshaphat Schellekens
Yehoshaphat Schellekens

Reputation: 2385

Load Pandas DF to Big Query fails

Im using to following code (Based on example pandas-gbq-migration) as following:

from google.cloud import bigquery
import pandas
import os
os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = "link_to_credentials.json"

df = pandas.DataFrame(
    {
        'my_string': ['a', 'b', 'c'],
        'my_int64': [1, 2, 3],
        'my_float64': [4.0, 5.0, 6.0],
    }
)
client = bigquery.Client()
dataset_ref = client.dataset('TMP')
table_ref = dataset_ref.table('yosh_try_uload_from_client')

client.load_table_from_dataframe(df, table_ref).result()

And Im getting the following error:

ImportError: Unable to find a usable engine; tried using: 'pyarrow', 'fastparquet'.
pyarrow or fastparquet is required for parquet support 

After Looking at some SO questions like this one:

google-cloud-bigquery-load-table-from-dataframe-parquet-attributeerror : https://cloud.google.com/bigquery/docs/pandas-gbq-migration

I understand that I need to change something in the configuration (Maybe add a schema?)

Can some one help me out here, i didn't manage to understand from the docs how to do that.

Thanks in advance!

Upvotes: 4

Views: 5569

Answers (1)

Lefteris S
Lefteris S

Reputation: 1672

You need to install pyarrow (docs indicate that, unless you have a parquet engine, an ImportError will be raised). The load_table_from_dataframe method writes the dataframe to parquet and sets source format to parquet in the load job. I am not really sure why this choice is made but it's hard-coded and installing pyarrow is more straightforward and safer than making your own implementation using a different format.

Upvotes: 6

Related Questions