Harunobu
Harunobu

Reputation: 61

PyMongo: NatType ValueError on Bulk Insert to New Collection

I am trying to use PyMongo to upload a mixed set of date and text data to a new collection in my remote MongoDB server.

However, I'm getting a error as a result of null values mixed in with dates—i.e. rows where there is a None value instead of a datetime.datetime() object.

As some background: The raw data is stored in a CSV file, which I'm reading into a pandas.DataFrame() using pandas.read_csv(). Once I have the data in pandas, I do some rudimentary cleaning before transforming the data into a list of dictionaries, which I then upload to the collection using the standard collection.insert_many() method.

Initially, the values in each row / document / dictionary are stored as strings. However, before uploading the data, I convert a number of date columns into datetime objects by calling datetime.datetime.strptime() on each value. Not every dictionary has these date fields populated, though. For these dictionaries, I simply use a None instead of a datetime object.

The resulting data that I'm trying to upload, then, is a list of dictionaries with a number of NoneType values mixed in, and when I call insert_many() I get this:

ValueError: NaTType does not support utcoffset.

I'm not familiar with utcoffset, and my attempts to research this have confounded me.

Has anyone run into this issue, or have suggestions on how to handle missing datetime data in PyMongo?

Here's my code:

import pandas as pd
import pymongo

source = '/path/to/data'
sampleData = pd.read_csv(source, dtype=str)

Date_Columns = [
    'date_a',
    'date_b',
    'date_c',
    'date_d'
]
cleanData = sampleData
for col in Date_Columns:

    # Convert the strings to datetime objects for each column.
    # If a value is null, then use a None object instead of a datetime.
    Strings = sampleData[col].values
    Formats = [dt.datetime.strptime(d, '%m/%d/%Y') if isinstance(d, str) else None for d in Strings]
    cleanData[col] = Formats

client = pymongo.MongoClient('XX.XX.XX.XX', 99999)
db = client['my_db']
c = db['my_collection']

# Convert the cleaned DataFrame into a list of dictionaries.
Keys = [key for key in sampleData.columns.values]
Data = [dict(zip(Keys, L)) for L in sampleData.values]

c.insert_many(Data)

And full traceback:

Traceback (most recent call last):
  File "/Users/haru/my_git/projects/pipeline/stable/sofla_permits_sunnyisles.py", line 738, in <module>
    setup_db()
  File "/Users/haru/my_git/projects/pipeline/stable/sofla_permits_sunnyisles.py", line 679, in setup_db
    c.insert_many(Data)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pymongo/collection.py", line 753, in insert_many
    blk.execute(write_concern, session=session)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pymongo/bulk.py", line 513, in execute
    return self.execute_command(generator, write_concern, session)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pymongo/bulk.py", line 338, in execute_command
    self.is_retryable, retryable_bulk, s, self)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pymongo/mongo_client.py", line 1196, in _retry_with_session
    return func(session, sock_info, retryable)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pymongo/bulk.py", line 333, in retryable_bulk
    retryable, full_result)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pymongo/bulk.py", line 285, in _execute_command
    self.collection.codec_options, bwc)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pymongo/message.py", line 1273, in _do_bulk_write_command
    namespace, operation, command, docs, check_keys, opts, ctx)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pymongo/message.py", line 1263, in _do_batched_write_command
    namespace, operation, command, docs, check_keys, opts, ctx)
  File "pandas/_libs/tslibs/nattype.pyx", line 59, in pandas._libs.tslibs.nattype._make_error_func.f
ValueError: NaTType does not support utcoffset

Upvotes: 3

Views: 2582

Answers (1)

C.Nivs
C.Nivs

Reputation: 13126

Most machines have their clocks set to utc, which is ideal. It's an integer value of seconds from a given date (sometime in the 70's, I believe). What this means is that your schedules for processes don't rely on local time, including the massive headache that is Daylight Savings.

UTC offset from US Eastern Standard is 4-5 hours (depending on daylight savings).

Looking at your error, this is a pandas error, and pandas.datetime does not play nicely with datetime.datetime. Convert it to a datetime string of the required precision. That should avoid this error.

Upvotes: 3

Related Questions