Andres Urrego Angel
Andres Urrego Angel

Reputation: 1932

GCP BigQuery loading data from a bucket

I'm really a freshman on the Google Cloud Platform. I'm trying to populate a BigQuery table with data pulled up from a bucket object CSV file. I created a Python test script to create and populate the table. The creation is done but when I run the file, I'm getting stuck.

My script:

from google.cloud import bigquery
from google.cloud.bigquery import LoadJobConfig
from google.cloud.bigquery import SchemaField

client = bigquery.Client()
dataset_ref = client.dataset('datasetname')


## Create the table

schema = [
    bigquery.SchemaField('start_date', 'DATETIME', mode='NULLABLE'),
    bigquery.SchemaField('start_station_code', 'INTEGER', mode='NULLABLE'),
    bigquery.SchemaField('end_date', 'DATETIME', mode='NULLABLE'),
    bigquery.SchemaField('end_station_code', 'INTEGER', mode='NULLABLE'),
    bigquery.SchemaField('duration_sec', 'INTEGER', mode='NULLABLE'),
    bigquery.SchemaField('is_member', 'INTEGER', mode='NULLABLE')
]
table_ref = dataset_ref.table('tablename')
table = bigquery.Table(table_ref, schema=schema)
table = client.create_table(table)  # API request

## Loading data


SCHEMA = [
    SchemaField('start_date', 'DATETIME', mode='NULLABLE'),
    SchemaField('start_station_code', 'INTEGER', mode='NULLABLE'),
    SchemaField('end_date', 'DATETIME', mode='NULLABLE'),
    SchemaField('end_station_code', 'INTEGER', mode='NULLABLE'),
    SchemaField('duration_sec', 'INTEGER', mode='NULLABLE'),
    SchemaField('is_member', 'INTEGER', mode='NULLABLE')
]
#table_ref = client.dataset('dataset_name').table('table_name')

load_config = LoadJobConfig()
load_config.skip_leading_rows = 1
load_config.schema = SCHEMA
uri = 'gs://gcp-development/object.csv'

load_job = client.load_table_from_uri(
    uri,
    table_ref,
    job_config=load_config)

load_job.result()

destination_table = client.get_table(table_ref)
print('Loaded {} rows.'.format(destination_table.num_rows))

Per the documentation, that seems correct. However, I'm getting the following error I don't understand, and I don't know how to check out the logs to get more details.

error:

google.api_core.exceptions.BadRequest: 400 Error while reading data, error message: CSV table encountered too many errors, giving up. Rows: 1; errors: 1. Please look into the error stream for more details.

Where is the error stream? I tried:

GET https://www.googleapis.com/bigquery/v2/projects/projectId/queries/jobId 

following the troubleshooting documentation but I retrieved nothing.

Thanks for your help.

Upvotes: 1

Views: 1124

Answers (1)

F10
F10

Reputation: 2893

I could load data by using your script with no issues at all. You could check the full error message within the Logs Viewer by selecting BigQuery in the first left menu. This may be related to a parsing error regarding the datetime fields. You could find more information about how to use the Logs Viewer in this document

The data sample that I used was the following:

start_date,start_station_code,end_date,end_station_code,duration_sec,is_member
0001-01-01 00:00:00.000000,1,9999-12-31 23:59:59.999999,2,3,4
0001-01-01 00:00:00.000000,2,9999-12-31 23:59:59.999999,3,4,5
0001-01-01 00:00:00.000000,3,9999-12-31 23:59:59.999999,4,5,6
0001-01-01 00:00:00.000000,4,9999-12-31 23:59:59.999999,5,6,7
0001-01-01 00:00:00.000000,5,9999-12-31 23:59:59.999999,6,7,8
0001-01-01 00:00:00.000000,6,9999-12-31 23:59:59.999999,7,8,9
0001-01-01 00:00:00.000000,7,9999-12-31 23:59:59.999999,8,9,10
0001-01-01 00:00:00.000000,8,9999-12-31 23:59:59.999999,9,10,11
0001-01-01 00:00:00.000000,9,9999-12-31 23:59:59.999999,10,11,12
0001-01-01 00:00:00.000000,10,9999-12-31 23:59:59.999999,11,12,13

Upvotes: 1

Related Questions