pymongo: a more efficient update

Question

I am trying to push some big files (around 4 million records) into a mongo instance. What I am basically trying to achieve is to update the existent data with the one from the files. The algorithm would look something like:

rowHeaders = ('orderId', 'manufacturer', 'itemWeight')
for row in dataFile:
    row = row.strip('
').split('	')
    row = dict(zip(rowHeaders, row))

    mongoRow = mongoCollection.find({'orderId': 12344})
    if mongoRow is not None:
        if mongoRow['itemWeight'] != row['itemWeight']:
            row['tsUpdated'] = time.time()
    else:
        row['tsUpdated'] = time.time()

    mongoCollection.update({'orderId': 12344}, row, upsert=True)

So, update the whole row besides 'tsUpdated' if weights are the same, add a new row if the row is not in mongo or update the whole row including 'tsUpdated' ... this is the algorithm

The question is: can this be done faster, easier and more efficient from mongo's point of view ? (eventually with some kind of bulk insert)

pymongo: a more efficient update

Answers (1)

Related Questions