bgraham
bgraham

Reputation: 1997

Is a MongoDB bulk upsert possible? C# Driver

I'd like to do a bulk upsert in Mongo. Basically I'm getting a list of objects from a vendor, but I don't know which ones I've gotten before (and need to be updated) vs which ones are new. One by one I could do an upsert, but UpdateMany doesn't work with upsert options.

So I've resorted to selecting the documents, updating in C#, and doing a bulk insert.

    public async Task BulkUpsertData(List<MyObject> newUpsertDatas)
    {
        var usernames = newUpsertDatas.Select(p => p.Username);
        var filter = Builders<MyObject>.Filter.In(p => p.Username, usernames);

        //Find all records that are in the list of newUpsertDatas (these need to be updated)
        var collection = Db.GetCollection<MyObject>("MyCollection");
        var existingDatas = await collection.Find(filter).ToListAsync();

        //loop through all of the new data, 
        foreach (var newUpsertData in newUpsertDatas)
        {
            //and find the matching existing data
            var existingData = existingDatas.FirstOrDefault(p => p.Id == newUpsertData.Id);
            //If there is existing data, preserve the date created (there are other fields I preserve)
            if (existingData == null)
            {
                newUpsertData.DateCreated = DateTime.Now;
            }
            else
            {
                newUpsertData.Id = existingData.Id;
                newUpsertData.DateCreated = existingData.DateCreated;
            }
        }

        await collection.DeleteManyAsync(filter);
        await collection.InsertManyAsync(newUpsertDatas);
    }

Is there a more efficient way to do this?

EDIT:

I did some speed tests.

In preparation I inserted 100,000 records of a pretty simple object. Then I upserted 200,000 records into the collection.

Method 1 is as outlined in the question. SelectMany, update in code, DeleteMany, InsertMany. This took approximately 5 seconds.

Method 2 was making a list of UpdateOneModel with Upsert = true and then doing one BulkWriteAsync. This was super slow. I could see the count in the mongo collection increasing so I know it was working. But after about 5 minutes it had only climbed to 107,000 so I canceled it.

I'm still interested if anyone else has a potential solution

Upvotes: 1

Views: 6135

Answers (1)

Kirk Larkin
Kirk Larkin

Reputation: 93093

Given that you've said you could do a one-by-one upsert, you can achieve what you want with BulkWriteAsync. This allows you to create one or more instances of the abstract WriteModel, which in your case would be instances of UpdateOneModel.

In order to achieve this, you could do something like the following:

var listOfUpdateModels = new List<UpdateOneModel<T>>();

// ...

var updateOneModel = new UpdateOneModel<T>(
    Builders<T>.Filter. /* etc. */,
    Builders<T>.Update. /* etc. */)
{
    IsUpsert = true;
};

listOfUpdateModels.Add(updateOneModel);

// ...

await mongoCollection.BulkWriteAsync(listOfUpdateModels);

The key to all of this is the IsUpsert property on UpdateOneModel.

Upvotes: 7

Related Questions