Reputation: 1454
I am trying to incrementally copy documents from one database to another.
Some fields contain date time values in the following format:
2016-09-22 00:00:00
while others are in this format:
2016-09-27 09:03:08.988
I extract and insert the documents like so:
pd.DataFrame(list(db_prod.db_name.collction_name.find({'_updated_at': {'$gt': last_added_timestamp}}).sort('_updated_at', 1)))
add = (df.to_dict('records'))
try:
db_incremental.other_db.collection_name.insert_many(add)
except BulkWriteError as bwe:
print(bwe.details)
here is the error:
File "/usr/local/lib/python2.7/dist-packages/pymongo/collection.py", line 684, in insert_many
blk.execute(self.write_concern.document)
File "/usr/local/lib/python2.7/dist-packages/pymongo/bulk.py", line 470, in execute
return self.execute_command(sock_info, generator, write_concern)
File "/usr/local/lib/python2.7/dist-packages/pymongo/bulk.py", line 302, in execute_command
run.ops, True, self.collection.codec_options, bwc)
File "pandas/tslib.pyx", line 663, in pandas.tslib._make_error_func.f (pandas/tslib.c:14736)
ValueError: NaTType does not support utcoffset
I dont actually need to modify the timestamps, just insert them as they are.
Any help appreciated.
Upvotes: 8
Views: 19046
Reputation: 111
def new_replace(k):
return k.replace(tzinfo=None)
df[time_column]= df.apply(lambda row: new_replace(row[time_column]),axis = 1)
This worked in my case. You can also add try except in the new_replace function according to your case.
Upvotes: 0
Reputation: 51
the cell probably don't have the same datetime format, you should standardize it first using pandas.DataFrame.apply, here is the example how to do it.
import datetime as dt
def handleString(probably_string):
# string pattern: 2016-09-27 09:03:08.988
try:
_date, _time = probably_string.split(' ')
_year, _month, _day = (int(x) for x in _date.plit('-'))
_hour, _minute, _second = (int(x) for x in _time.split(':'))
return dt.datetime(_year, _month, _day, _hour, _minute, _second)
except AttributeError:
# it's NoneType oject
# but you should return datetime object for mongodb datetime field
return dt.datetime(1970,1,1)
except ValueError:
# not enough values to unpack
# but you should return datetime object for mongodb datetime field
return dt.datetime(1970,1,1)
def formatTime(row, column_name):
datetime_cell = row[column_name]
try:
_second = datetime_cell.second
return datetime_cell.replace(second=_second, microsecond=0)
catch AttributeError:
return handleString(datetime_cell)
time_column = 'time_field'
df[time_column] = df.apply(lambda row: formatTime(row, time_column), axis='columns')
Upvotes: 1
Reputation: 5239
Replace it with None values which can be interpreted by pandas
df[['_updated_at']] = df[['_updated_at']].astype(object).where(df[['_updated_at']].notnull(), None)
Upvotes: 4