lee
lee

Reputation: 69

mongodb replace_one with upsert=True upserts when it should replace. How to fix?

Background: This is my first time working with mongodb on a project. I am using pymongo and fastapi to insert a document representing a location into my alllocations collection. For this project I am pulling locations from other APIs on the internet and trying to keep from adding the same location multiple times. For this, each online API has been given a unique string variable: foreignAPIAuthority and each foreignAPIAuthority gives a locationID uuid string to each location json document they hand out. The combination of these two values keeps two companies/organizations from having locationID uuid conflicts.

Issue: I have created an /addone function that is meant to be handed a JSON object representing a location. That location, very simply needs to be added to the locations.alllocations collection without duplicating any location. So add it if it isn't there, don't add it if it already is. Originally I was using insert_one to add these objects then discovered replace_one(upsert=True) which very neatly will replace a document if any are returned by the filter (so I can update existing documents just by adding the new version), but if no document matches the filter, then the replacement is inserted. This single function should give the desired insert without duplication functionality. It does for most of my testing but in a test today I discovered it didn't work as advertised and I'm wondering what went wrong.

Test: Create 4 documents. All documents come from the same 'TEST' foreignAPIAuthority but documents 1 and 3 should have the same locationID and documents 2 and 4 should have the same locationID. Subsequently send each document to the /addone endpoint. If replace_one(upsert=True) works as advertized then with an intially empty collection, documents 1 and 2 should be upserted because they didn't match anything then documents 3 and 4 should replace 1 and 2 because they have the same foreignAPIAuthority/locationID pairs (respectively).

Results: But that's not what happened in today's test. Note my python http POSTs are async and went in the order 1,2,4,3 but this should not be relevant. Documents 1 and 2 got upserted as expected. Then document 4 was POSTed but was also upserted. Then document 3 was POSTed and correclty replaced document 1.

My question: What happened that despite each document being added one by one and doc 4 came after doc 2 that doc 4 did not replace document 2 but was instead inserted replacing doc 2? The only thing I can think of is write-concern but I'm only using a single node for development so propogation does not need to take place. I am also using wiredtiger journaling with commitIntervalMs: 400. Is this the problem? Does it need to be shorter? I have not modified the default write-concern policy. Could it be something else?

Edit: I should clarify that mongodb is running synchronously and so is pymongo. PyMongo is inherently synchronous. I am running a single locations authority service to which these /addone calls are being made from a different python test script. The http calls from the test script to the locations authority service are asynchronous async/await. The pymongo calls from the locations authority to mongodb are all synchronous. The order in which the asynchronous request are made can be checked in the docker logs for the locations authroity service. I have confirmed the order the docs are POSTed in is 1,2,4,3. From there they are (should be) synchronously replaced/upserted by pymongo.

Thank you for your help ahead of time. Code is attatched below.

mongod.yaml

net:
  port: 27017

storage:
   dbPath: /var/lib/mongodb
   journal:
      enabled: true
      commitIntervalMs: 400
   directoryPerDB: true

/addone endpoint:

class Location(BaseModel):
    foreignAPIAuthority: str
    datetime: str
    locationID: str
    chain: str
    name: str
    address: dict
    latlong: list # [lat, long] = [float, float]
    hours: dict
    phone: str

# Insert a single location document that conforms to the Location data model
@app.post("/addone")
async def addone( location: Location ):
    print( "addone:\n{}".format(location.__dict__) )
    # Using replace_one(upsert=True) instead of insert_one ensures duplicates aren't created for the same foreignAPIAuthority,locationID pair.
    ret = alllocs.replace_one(
        { 
            'foreignAPIAuthority': location.foreignAPIAuthority,
            'locationID': location.locationID
        },
        location.__dict__,
        upsert=True
    )
    if ret.upserted_id is not None:
        resp = str(ret.upserted_id)
        print("upserted_id: "+resp)
    else:
        resp = str(alllocs.find_one({'foreignAPIAuthority': location.foreignAPIAuthority, 'locationID': location.locationID})['_id'])
        print("modified item id: "+str(resp))
        
    return Response(content= json.dumps(resp), media_type="application/json")

Docker logs from the test with personal data changed:

locations-locationsapi-1  | adding:
locations-locationsapi-1  | {'foreignAPIAuthority': 'TEST', 'datetime': '2022-01-02T14:22:29.853051', 'locationID': '123456', 'chain': 'TEST', 'name': 'TEST in the ocean :)', 'address': {'street': '123 Main street', 'city': 'Huntsville', 'county': 'Madison', 'zipcode': '12345', 'state': 'AL'}, 'latlong': [1e-05, 1e-05], 'hours': {'sunday': {'open': '00:00', 'close': '00:00', 'open24': False}, 'monday': {'open': '07:00', 'close': '20:00', 'open24': False}}, 'phone': '256-123-1234'}
locations-locationsapi-1  | upserted_id: 61d2098e21b6b4df497fe747
locations-locationsapi-1  | 172.22.0.1:36656 - "POST /addone HTTP/1.1" 200
locations-locationsapi-1  | adding:
locations-locationsapi-1  | {'foreignAPIAuthority': 'TEST', 'datetime': '2021-12-26T14:22:28.853051', 'locationID': '123457', 'chain': 'TEST', 'name': 'TEST in the ocean :)', 'address': {'street': '123 Main street', 'city': 'Huntsville', 'county': 'Madison', 'zipcode': '12345', 'state': 'AL'}, 'latlong': [2e-05, 2e-05], 'hours': {'sunday': {'open': '00:00', 'close': '00:00', 'open24': False}, 'monday': {'open': '07:00', 'close': '20:00', 'open24': False}}, 'phone': '256-123-1234'}
locations-locationsapi-1  | upserted_id: 61d2098e21b6b4df497fe74a
locations-locationsapi-1  | 172.22.0.1:36654 - "POST /addone HTTP/1.1" 200
locations-locationsapi-1  | adding:
locations-locationsapi-1  | {'foreignAPIAuthority': 'TEST', 'datetime': '2021-12-26T14:22:28.853051', 'locationID': '123457', 'chain': 'TEST', 'name': 'TEST in the ocean :)', 'address': {'street': '123 Main street', 'city': 'Huntsville', 'county': 'Madison', 'zipcode': '12345', 'state': 'AL'}, 'latlong': [2e-05, 2e-05], 'hours': {'sunday': {'open': '00:00', 'close': '00:00', 'open24': False}, 'monday': {'open': '07:00', 'close': '20:00', 'open24': False}}, 'phone': '256-123-1234'}
locations-locationsapi-1  | upserted_id: 61d2098e21b6b4df497fe74b
locations-locationsapi-1  | 172.22.0.1:36660 - "POST /addone HTTP/1.1" 200
locations-locationsapi-1  | {"loglevel": "info", "workers": 12, "bind": "0.0.0.0:80", "graceful_timeout": 120, "timeout": 120, "keepalive": 5, "errorlog": "-", "accesslog": "-", "workers_per_core": 1.0, "use_max_workers": null, "host": "0.0.0.0", "port": "80"}
locations-locationsapi-1  | adding:
locations-locationsapi-1  | {'foreignAPIAuthority': 'TEST', 'datetime': '2022-01-02T14:22:29.853051', 'locationID': '123456', 'chain': 'TEST', 'name': 'TEST in the ocean :)', 'address': {'street': '123 Main street', 'city': 'Huntsville', 'county': 'Madison', 'zipcode': '12345', 'state': 'AL'}, 'latlong': [0.0, 0.0], 'hours': {'sunday': {'open': '00:00', 'close': '00:00', 'open24': False}, 'monday': {'open': '07:00', 'close': '20:00', 'open24': False}}, 'phone': '256-123-1234'}
locations-locationsapi-1  | modified item id: 61d2098e21b6b4df497fe747

Again, thanks for all of your insight(s) into this topic ahead of time.

Upvotes: 1

Views: 778

Answers (0)

Related Questions