Reputation:
I have a dataframe that contains data I want to upload into MongoDB. Below is the data:
MongoRow = pd.DataFrame.from_dict({'school': {1: schoolID}, 'student': {1: student}, 'date': {1: dateToday}, 'Probability': {1: probabilityOfLowerThanThreshold}})
school student date Probability
1 5beee5678d62101c9c4e7dbb 5bf3e06f9a892068705d8420 2020-03-27 0.000038
I have the following code which checks if a row in mongo contains the same student ID and date, if it doesn't then it adds the row:
def getPredictions(school):
schoolDB = DB[school['database']['name']]
schoolPredictions = schoolDB['session_attendance_predicted']
Predictions = schoolPredictions.aggregate([{
'$project': {
'school': '$school',
'student':'$student',
'date':'$date'
}
}])
return list(Predictions)
Predictions = getPredictions(school)
Predictions = pd.DataFrame(Predictions)
schoolDB = DB[school['database']['name']]
collection = schoolDB['session_attendance_predicted']
import json
for i in Predictions.index:
schoolOld = Predictions.loc[i,'school']
studentOld = Predictions.loc[i,'student']
dateOld = Predictions.loc[i,'date']
if(studentOld == student and date == dateOld):
print("Student Exists")
#UPDATE THE ROW WITH NEW VALUES
else:
print("Student Doesn't Exist")
records = json.loads(df.T.to_json()).values()
collection.insert(records)
However if it does exist, I want it to update the row with the new values. Does anyone know how to do this? I have looked at pymongo upsert but I'm not sure how to use it. Can anyone help?
'''''''UPDATE'''''''
The above is partly working now, however, I am now getting an error with the following code:
dateToday = datetime.datetime.combine(dateToday, datetime.time(0, 0))
MongoRow = pd.DataFrame.from_dict({'school': {1: schoolID}, 'student': {1: student}, 'date': {1: dateToday}, 'Probability': {1: probabilityOfLowerThanThreshold}})
data_dict = MongoRow.to_dict()
for i in Predictions.index:
print(Predictions)
collection.replace_one({'student': student, 'date': dateToday}, data_dict, upsert=True)
Error:
InvalidDocument: documents must have only string keys, key was 1
Upvotes: 13
Views: 35055
Reputation: 7276
Probably a number of people are going to be confused by the accepted answer as it suggests using replace_one
with the upsert
flag.
Upserting means 'Updated or Insert' (Up = update and sert= insert). For most people looking to 'upsert', they should be using update_one
with the upsert
flag.
For example:
collection.update_one({'matchable_field': field_data_to_match}, {"$set": upsertable_data}, upsert=True)
Upvotes: 22
Reputation: 8814
To upsert you cannot use insert()
(deprecated) insert_one()
or insert_many()
. You must use one of the collection level operators that supports upserting.
To get started I would point you towards reading the dataframe line by line and using replace_one()
on each line. There are more advanced ways of doing this but this is the easiest.
Your code will look a bit like:
collection.replace_one({'Student': student, 'Date': date}, record, upsert=True)
Upvotes: 10