Matteo Felici
Matteo Felici

Reputation: 1107

Bigquery Python API - Write dataframe order by column

I'm trying to write a pandas.Dataframe to Bigquery using the Python API, sorting records by a column:

from google.cloud import bigquery

client = bigquery.Client(project=project_id)

df = pd.DataFrame(...)
df.sort_values('date', inplace=True)

job_config = bigquery.LoadJobConfig(
    schema=[
        bigquery.SchemaField("date", "DATE"),
        bigquery.SchemaField("col1", "INTEGER"),
        bigquery.SchemaField("col2", "INTEGER"),
        bigquery.SchemaField("col3", "STRING")
    ],
    write_disposition="WRITE_TRUNCATE"
)

job_update = client.load_table_from_dataframe(
    df, output_table, job_config=job_config
)

The process correctly creates the table with the correct values, but the rows are not ordered by date. Is there a parameter / method to define the order in the job_config?

Upvotes: 0

Views: 756

Answers (1)

Prajna Rai T
Prajna Rai T

Reputation: 1810

As mentioned in the comments to the question there is no parameter / method to define the order for any specific column in the job_config.

Similarly with most relational databases, data in BigQuery should never be considered sorted. If you need it sorted you have to specify that in the query that you use to retrieve the data by using the ORDER BY clause.

Upvotes: 1

Related Questions