Derpa
Derpa

Reputation: 27

Python3, MongoDB Insert only if document does not exist

I currently have a dictionary with data being pulled from an API, where I have given each datapoint it's own variable (job_id, jobtitle, company etc.):

output = {
        'ID': job_id, 
        'Title': jobtitle, 
        'Employer' : company, 
        'Employment type' : emptype, 
        'Fulltime' : tid, 
        'Deadline' : deadline, 
        'Link' : webpage
}

that I want to add to my database, easy enough:

db.jobs.insert_one(output)

but this is all in a for loop that will create 30-ish unique new documents, with names, titles, links and whatnot, this script will be run more than once, so what I would like for it to do is only insert the "output" as a document if it doesn't already exist in the database, all of these new documents do have their own unique ID's coming from the job_id variable am I able to check against that?

Upvotes: 2

Views: 4833

Answers (2)

Belly Buster
Belly Buster

Reputation: 8814

EDIT:

replace

db.jobs.insert_one(output)

with

db.jobs.replace_one({'ID': job_id}, output, upsert=True)

ORIGINAL ANSWER with worked example:

Use replace_one() with upsert=True. You can run this multiple times and it will with insert if the ID isn't found or replace if it is found. It wasn't quite what you were asking as the data is always updated (so newer data will overwrite any existing data).

from pymongo import MongoClient


db = MongoClient()['mydatabase']

for i in range(30):
    db.employer.replace_one({'ID': i},
    {
            'ID': i,
            'Title': 'jobtitle',
            'Employer' : 'company',
            'Employment type' : 'emptype',
            'Fulltime' : 'tid',
            'Deadline' : 'deadline',
            'Link' : 'webpage'
    }, upsert=True)

# Should always print 30 regardless of number of times run.
print(db.employer.count_documents({}))

Upvotes: 1

whoami - fakeFaceTrueSoul
whoami - fakeFaceTrueSoul

Reputation: 17915

You need to try two things :

1) Doing .find() & if no document found for given job_id then writing to DB is a two way call - Instead you can have an unique-index on job_id field, that will throw an error if your operation tries to insert duplicate document (Having unique index is much more safer way to avoid duplicates, even helpful if your code logic fails).

2) If you've 30 dict's - You no need to iterate for 30 times & use insert_one to make 30 database calls, instead you can use insert_many which takes in an array of dict's & writes to database.

Note : By default all dict's are written in the order they're in the array, in case if a dict fails cause of duplicate error then insert_many fails at that point without inserting rest others, So to overcome this you need to pass an option ordered=False that way all dictionaries will be inserted except duplicates.

Upvotes: 3

Related Questions