Reputation: 194
I have an ETL process that executes as a cloud function that reads a csv from a cloud storage bucket and writes it to a BigQuery Table.
def gcs_to_table(uri,table_id,project_id):
logger.info(f'Writing data from {uri} to {table_id}.')
# Construct a BigQuery client object.
client = bigquery.Client()
job_config = bigquery.LoadJobConfig(
write_disposition=bigquery.WriteDisposition.WRITE_TRUNCATE,
source_format=bigquery.SourceFormat.CSV,
skip_leading_rows=1,
allow_jagged_rows=True,
max_bad_records=100,
)
load_job = client.load_table_from_uri(
uri, table_id, job_config=job_config
)
load_job.result() # Waits for the job to complete.
destination_table = client.get_table(table_id)
logger.info(f"Loaded {destination_table.num_rows} rows to table {table_id}".format())
This solution has been failing due to exceeding the 100 error limit, usually with an error like this one:
message: Error while reading data, error message: Unable to parse; byte_offset_to_start_of_line: 506943943 column_index: 11 column_name: "SomeColumn" column_type: DOUBLE value: "SomeString"
"SomeString" belongs in the previous column, so the columns are somehow getting misaligned during the load job. However, using notepad++, I've looked at the raw file and checked for obvious formatting issues with the csv such as mismatched quote characters or commas inside of string values, and I don't see anything that could be causing it. I suspect this has something to do with my configuration and not the format of the CSV since when I run the process using pandas and pandas_gbq on that same CSV it works just fine:
def gcs_to_table(gcs_object,table_name,project_id):
logger.info(f'Writing data from {gcs_object} to {table_name}.')
dat = pd.read_csv(gcs_object)
dat.to_gbq(table_name, if_exists='replace')
logger.info(f'Wrote DataFrame with shape: {dat.shape}.')
The pandas solution is much slower and can run into timeout issues, so I would prefer to keep using BigQuery client if I can get these issues resolved.
Any wisdom or even troubleshooting steps are greatly appreciated!
Upvotes: 0
Views: 46