jefafa
jefafa

Reputation: 11

BigQuery insert dates into 'DATE' type field using Python Google Cloud library

I'm using Python 2.7 and the Google Cloud Client Library for Python (v0.27.0) to insert data into a BigQuery table (using table.insert_data()).

One of the fields in my table has type 'DATE'.

In my Python script I've formatted the date-data as 'YYYY-MM-DD', but unfortunately the Google Cloud library returns an 'Invalid date:' error for that field.

I've tried formatting the date-field in many ways (i.e. 'YYYYMMDD', timestamp etc.), but no luck so far...

Unfortunately the API docs (https://googlecloudplatform.github.io/google-cloud-python/latest/) don't mention anything about the required date format/type/object in Python.

This is my code:

from google.cloud import bigquery
import pandas as pd
import json
from pprint import pprint
from collections import OrderedDict

# Using a pandas dataframe 'df' as input

# Converting date field to YYYY-MM-DD format

df['DATE_VALUE_LOCAL'] = df['DATE_VALUE_LOCAL'].apply(lambda x: x.strftime('%Y-%m-%d'))

# Converting pandas dataframe to json

json_data = df.to_json(orient='records',date_format='iso')

# Instantiates a client
bigquery_client = bigquery.Client(project="xxx")

# The name for the new dataset
dataset_name = 'dataset_name'
table_name = 'table_name'

def stream_data(dataset_name, table_name, json_data):
    dataset = bigquery_client.dataset(dataset_name)
    table = dataset.table(table_name)

    data = json.loads(json_data, object_pairs_hook=OrderedDict)

    # Reload the table to get the schema.
    table.reload()

    errors = table.insert_data(data)

    if not errors:
        print('Loaded 1 row into {}:{}'.format(dataset_name, table_name))
    else:
        print('Errors:')
        pprint(errors)

stream_data(dataset_name, table_name, json_data)

What is the required Python date format/type/object to insert my dates into a BigQuery DATE field?

Upvotes: 1

Views: 12597

Answers (1)

Willian Fuks
Willian Fuks

Reputation: 11777

I just simulated your code here and everything worked fine. Here's what I've simulated:

import pandas as pd
import json
import os
from collections import OrderedDict
from google.cloud.bigquery import Client

d = {'ed': ['3', '5'],
     'date': ['2017-10-11', '2017-11-12']}
json_data = df.to_json(orient='records', date_formate='iso')
json_data = json.loads(json_data, object_pairs_hook=OrderedDict)

os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = '/key.json'
bc = Client()
ds = bc.dataset('dataset name')
table = ds.table('table I just created')
table = bc.get_table(table)
bc.create_rows(table, json_data)

It's using version 0.28.0 but still it's the same methods from previous versions.

You probably have some mistake going on in some step that maybe is converting date to some other unidentifiable format for BQ. Try using this script as reference to see where the mistake might be happening in your code.

Upvotes: 2

Related Questions