Reputation: 1107
I'm trying to write a pandas.Dataframe
to Bigquery using the Python API, sorting records by a column:
from google.cloud import bigquery
client = bigquery.Client(project=project_id)
df = pd.DataFrame(...)
df.sort_values('date', inplace=True)
job_config = bigquery.LoadJobConfig(
schema=[
bigquery.SchemaField("date", "DATE"),
bigquery.SchemaField("col1", "INTEGER"),
bigquery.SchemaField("col2", "INTEGER"),
bigquery.SchemaField("col3", "STRING")
],
write_disposition="WRITE_TRUNCATE"
)
job_update = client.load_table_from_dataframe(
df, output_table, job_config=job_config
)
The process correctly creates the table with the correct values, but the rows are not ordered by date
. Is there a parameter / method to define the order in the job_config
?
Upvotes: 0
Views: 756
Reputation: 1810
As mentioned in the comments to the question there is no parameter / method to define the order for any specific column in the job_config.
Similarly with most relational databases, data in BigQuery should never be considered sorted. If you need it sorted you have to specify that in the query that you use to retrieve the data by using the ORDER BY clause.
Upvotes: 1