Reputation: 27
I currently have a dictionary with data being pulled from an API, where I have given each datapoint it's own variable (job_id, jobtitle, company etc.):
output = {
'ID': job_id,
'Title': jobtitle,
'Employer' : company,
'Employment type' : emptype,
'Fulltime' : tid,
'Deadline' : deadline,
'Link' : webpage
}
that I want to add to my database, easy enough:
db.jobs.insert_one(output)
but this is all in a for loop that will create 30-ish unique new documents, with names, titles, links and whatnot, this script will be run more than once, so what I would like for it to do is only insert the "output" as a document if it doesn't already exist in the database, all of these new documents do have their own unique ID's coming from the job_id variable am I able to check against that?
Upvotes: 2
Views: 4833
Reputation: 8814
EDIT:
replace
db.jobs.insert_one(output)
with
db.jobs.replace_one({'ID': job_id}, output, upsert=True)
ORIGINAL ANSWER with worked example:
Use replace_one()
with upsert=True
. You can run this multiple times and it will with insert if the ID
isn't found or replace if it is found. It wasn't quite what you were asking as the data is always updated (so newer data will overwrite any existing data).
from pymongo import MongoClient
db = MongoClient()['mydatabase']
for i in range(30):
db.employer.replace_one({'ID': i},
{
'ID': i,
'Title': 'jobtitle',
'Employer' : 'company',
'Employment type' : 'emptype',
'Fulltime' : 'tid',
'Deadline' : 'deadline',
'Link' : 'webpage'
}, upsert=True)
# Should always print 30 regardless of number of times run.
print(db.employer.count_documents({}))
Upvotes: 1
Reputation: 17915
You need to try two things :
1) Doing .find()
& if no document found for given job_id
then writing to DB is a two way call - Instead you can have an unique-index on job_id
field, that will throw an error if your operation tries to insert duplicate document (Having unique index is much more safer way to avoid duplicates, even helpful if your code logic fails).
2) If you've 30 dict's - You no need to iterate for 30 times & use insert_one
to make 30 database calls, instead you can use insert_many which takes in an array of dict's & writes to database.
Note : By default all dict's are written in the order they're in the array, in case if a dict fails cause of duplicate error then insert_many
fails at that point without inserting rest others, So to overcome this you need to pass an option
ordered=False
that way all dictionaries will be inserted except duplicates.
Upvotes: 3