archienorman
archienorman

Reputation: 1454

Pymongo - ValueError: NaTType does not support utcoffset when using insert_many

I am trying to incrementally copy documents from one database to another.

Some fields contain date time values in the following format:

2016-09-22 00:00:00

while others are in this format:

2016-09-27 09:03:08.988

I extract and insert the documents like so:

pd.DataFrame(list(db_prod.db_name.collction_name.find({'_updated_at': {'$gt': last_added_timestamp}}).sort('_updated_at', 1)))
add = (df.to_dict('records'))

try:
    db_incremental.other_db.collection_name.insert_many(add)
except BulkWriteError as bwe:
    print(bwe.details)

here is the error:

  File "/usr/local/lib/python2.7/dist-packages/pymongo/collection.py", line 684, in insert_many
    blk.execute(self.write_concern.document)
  File "/usr/local/lib/python2.7/dist-packages/pymongo/bulk.py", line 470, in execute
    return self.execute_command(sock_info, generator, write_concern)
  File "/usr/local/lib/python2.7/dist-packages/pymongo/bulk.py", line 302, in execute_command
    run.ops, True, self.collection.codec_options, bwc)
  File "pandas/tslib.pyx", line 663, in pandas.tslib._make_error_func.f (pandas/tslib.c:14736)
ValueError: NaTType does not support utcoffset

I dont actually need to modify the timestamps, just insert them as they are.

Any help appreciated.

Upvotes: 8

Views: 19046

Answers (4)

Parth chokhra
Parth chokhra

Reputation: 111

def new_replace(k):
    return k.replace(tzinfo=None)
    

df[time_column]= df.apply(lambda row: new_replace(row[time_column]),axis = 1)

This worked in my case. You can also add try except in the new_replace function according to your case.

Upvotes: 0

Deadelina
Deadelina

Reputation: 51

the cell probably don't have the same datetime format, you should standardize it first using pandas.DataFrame.apply, here is the example how to do it.

import datetime as dt

def handleString(probably_string):
    # string pattern: 2016-09-27 09:03:08.988
    try:
        _date, _time = probably_string.split(' ')
        _year, _month, _day = (int(x) for x in _date.plit('-'))
        _hour, _minute, _second = (int(x) for x in _time.split(':'))
        return dt.datetime(_year, _month, _day, _hour, _minute, _second)
    except AttributeError:
        # it's NoneType oject
        # but you should return datetime object for mongodb datetime field
        return dt.datetime(1970,1,1)
    except ValueError:
        # not enough values to unpack
        # but you should return datetime object for mongodb datetime field
        return dt.datetime(1970,1,1)

def formatTime(row, column_name):
    datetime_cell = row[column_name]
    try:
        _second = datetime_cell.second
        return datetime_cell.replace(second=_second, microsecond=0)
    catch AttributeError:
        return handleString(datetime_cell)

time_column = 'time_field'
df[time_column] = df.apply(lambda row: formatTime(row, time_column), axis='columns')

Upvotes: 1

R.singh
R.singh

Reputation: 279

Apparently it worked for me by using

df.fillna("-",inplace=True)

Upvotes: 0

gies0r
gies0r

Reputation: 5239

Replace it with None values which can be interpreted by pandas

df[['_updated_at']] = df[['_updated_at']].astype(object).where(df[['_updated_at']].notnull(), None)

Upvotes: 4

Related Questions