Reputation: 301
After using Mongoimport
to import a CSV
file to my database, I want to add a new field or element per document. And, the data per for this new field is the is the index number plus 2.
Dim documents = DB.GetCollection(Of BsonDocument)(collectionName).Find(filterSelectedDocuments).ToListAsync.Result
For Each doc in documents
DB.GetCollection(Of BsonDocument)(collectionName).UpdateOneAsync(
Builders(Of BsonDocument).Filter.Eq(Of ObjectId)("_id", doc.GetValue("_id").AsObjectId),
Builders(Of BsonDocument).Update.Set(Of Integer)("increment.value", documents.IndexOf(doc) + 2).Wait()
Next
If I have over a million of data to import, is there a better way to achieved this like using UpdateManyAsync
?
Upvotes: 0
Views: 99
Reputation: 10918
Just as a side note: Since you've got the Wait()
and the Result
everywhere, the Async
methods don't seem to make an awful lot of sense. Also, your logic appears flawed since there is no .Sort()
anywhere. So you've got no guarantee about the order of your returned documents. Is it indended that every document just gets a kind of random but unique and increasing number assigned?
Anyway, to make this faster, you'd really want to patch your CSV file and write the increasing "increment.value" field straight into it before the import. This way, you've got your value directly in MongoDB and do not need to query and update the imported data again.
If this is not an option you could optimize your code like this:
_id
of your documents - that's all you need and it will majorly impact your .find()
perfomance since a lot less data needs to be transferred/deserialized from MongoDB.Enumerable
of your result instead of using a fully populated list.yield
semantics for nicer streaming. However, that's getting a little complicated and may not even be needed.The following should get you going faster already:
' just some cached values
Dim filterDefinitionBuilder = Builders(Of BsonDocument).Filter
Dim updateDefinitionBuilder = Builders(Of BsonDocument).Update
Dim collection = DB.GetCollection(Of BsonDocument)(collectionName)
' load only _id field
Dim documentIds = collection.Find(filterSelectedDocuments).Project(Function(doc) doc.GetValue("_id")).ToEnumerable()
' bulk write buffer (pre-initialized to size 1000 to avoid memory traffic upon array expansion)
Dim updateModelsBuffer = new List(Of UpdateOneModel(Of BsonDocument))(1000)
' starting value for our update counter
Dim i As Long = 2
For Each objectId In documentIds
' for every document we want one update command...
' ...that finds exactly one document identified by its _id field
Dim filterDefinition = filterDefinitionBuilder.Eq(Of ObjectId)("_id", objectId)
' ...and updates the "increment.value" with our running counter
Dim updateDefinition = updateDefinitionBuilder.Set(Of Integer)("increment.value", i)
updateModelsBuffer.Add(New UpdateOneModel(Of BsonDocument)(filterDefinition, updateDefinition))
' every e.g. 1000 documents
If updateModelsBuffer.Count = 1000
' we flush the contents to the database
collection.BulkWrite(updateModelsBuffer)
' and we empty our buffer list
updateModelsBuffer.Clear()
End If
i = i + 1
Next
' flush left over commands that have not been written yet in case we do not have a multiple of 1000 documents
collection.BulkWrite(updateModelsBuffer)
Upvotes: 1