Oleh Omelchenko
Oleh Omelchenko

Reputation: 122

Method to_gbq() in pandas-gbq library strips milliseconds and seconds part of datetime field

I'm using version 0.3.1. The simplest way to reproduce:

import pandas as pd, pandas_gbq
data = {"date": datetime.datetime.now()} # 2018-03-16 16:09:03.230384
df = pd.DataFrame(data, index=[0]

destination_table = 'test_tables.test_datetime'
project_id = 'my-project-11111'
private_key = 'path-to-key.json'
pandas_gbq.to_gbq(
  df,
  destination_table,
  project_id,
  private_key=private_key
)

When I go to check the values in the created table in Google BigQuery, the value doesn't keep seconds and milliseconds part of the DateTime: 2018-03-16 16:09:00.000 UTC

The documentation doesn't say anything about this behavior, so I assume this is a bug. But maybe I'm missing something here?

Upvotes: 0

Views: 1597

Answers (1)

Tudormi
Tudormi

Reputation: 1112

I have reproduced your issue and decided to compare the pandas_gbq approach versus the classic bigquery python client library:

from google.cloud import bigquery
import pandas as pd, pandas_gbq, datetime
data = {"date": datetime.datetime.now()}
print data
df = pd.DataFrame(data, index=[0])
print df

destination_table = 'test_tables.test_datetime'
project_id = 'example-project'
private_key = '/home/Workspace/example-service-account.json'
pandas_gbq.to_gbq(
  df,
  destination_table,
  project_id,
  private_key=private_key,
  if_exists='append'
)
# ^What gets inserted with the above method loses the milliseconds^


# BigQuery client library attempt

bigquery_client = bigquery.Client()

dataset_id = 'test_tables'
dataset_ref = bigquery_client.dataset(dataset_id)
table_ref = dataset_ref.table('test_datetime')

df.to_csv('/home/Workspace/test.csv')
# You need to cleanup the CSV,to respect the table schema

with open('/home/Workspace/test.csv', 'rb') as source_file:
  job_config = bigquery.LoadJobConfig()
  job_config.source_format = 'text/csv'
  job = bigquery_client.load_table_from_file(source_file, table_ref, job_config=job_config)

job.result()    
print('Loaded {} rows into {}:{}.'.format(job.output_rows, dataset_id, 'test_datetime'))
# ^The timestamps inserted from the *.csv by the client library keep their millisecond information^

I believe this is an issue isolated in the pandas_gbq library, so I suggest posting/searching for it in their Github issue tracker.

Upvotes: 2

Related Questions