MarkM
MarkM

Reputation: 53

MongoDB - Update text to Proper / title Case

We have a large collection of documents with various text case when entered for their description eg

Desc = 
'THE CAT"
or
"The Dog"
or
"the cow"

We want to make all consistent in Title (Or Proper case) where first letter of each word is upper and rest lower case.

"The Cat", "The Dog", "The Cow"

Looking for assistance in creating update query to do that on mass, rather than manual as data team is doing at present.

thanks

Upvotes: 1

Views: 2538

Answers (1)

chridam
chridam

Reputation: 103395

The algorithm for changing the title case below uses Array.prototype.map() method and the String.prototype.replace() method which returns a new string with some or all matches of a pattern replaced by a replacement. In your case, the pattern for the replace() method will be a String to be replaced by a new replacement and will be treated as a verbatim string.

First off, you need to lowercase and split the string before applying the map() method. Once you define a function that implements the conversion, you then need to iterate your collection to apply an update with this function. Use the cursor.forEach() method on the cursor returned by find() to do the loop and within the loop you can then run an update on each document using the updateOne() method.

For relatively small datasets, the whole operation can be described by the following

function titleCase(str) {
    return str.toLowerCase().split(' ').map(function(word) {
        return word.replace(word[0], word[0].toUpperCase());
    }).join(' ');
}

db.collection.find({}).forEach(function(doc){
    db.collection.updateOne(
        { "_id": doc._id },
        { "$set": { "desc": titleCase(doc.desc) } }
    );
});

For improved performance especially when dealing with huge datasets, take advantage of using a Bulk() API for updating the collection efficiently in bulk as you will be sending the operations to the server in batches (for example, say a batch size of 500). This gives you much better performance since you won't be sending every request to the server but just once in every 500 requests, thus making your updates more efficient and quicker.

The following demonstrates this approach, the first example uses the Bulk() API available in MongoDB versions >= 2.6 and < 3.2. It updates all the documents in the collection by transforming the title on the desc field using the above function.

MongoDB versions >= 2.6 and < 3.2:

function titleCase(str) {
    return str.toLowerCase().split(' ').map(function(word) {
        return word.replace(word[0], word[0].toUpperCase());
    }).join(' ');
}

var bulk = db.collection.initializeUnorderedBulkOp(),
    counter = 0;

db.collection.find().forEach(function (doc) {    
    bulk.find({ "_id": doc._id }).updateOne({ 
        "$set": { "desc": titleCase(doc.desc) }
    });

    counter++;
    if (counter % 500 === 0) {
        // Execute per 500 operations
        bulk.execute(); 
        // re-initialize every 500 update statements
        bulk = db.collection.initializeUnorderedBulkOp();
    }
})
// Clean up remaining queue
if (counter % 500 !== 0) { bulk.execute(); }

The next example applies to the new MongoDB version 3.2 which has since deprecated the Bulk() API and provided a newer set of apis using bulkWrite().

MongoDB version 3.2 and greater:

var ops = [],
    titleCase = function(str) {
        return str.toLowerCase().split(' ').map(function(word) {
            return word.replace(word[0], word[0].toUpperCase());
        }).join(' ');
    };
    
db.Books.find({
    "title": {
        "$exists": true,
        "$type": 2
    }
}).forEach(function(doc) {
    ops.push({
        "updateOne": {
            "filter": { "_id": doc._id },
            "update": {
                "$set": { "title": titleCase(doc.title) }
            }
        }
    });

    if (ops.length === 500 ) {
        db.Books.bulkWrite(ops);
        ops = [];
    }
})

if (ops.length > 0)  
    db.Books.bulkWrite(ops);

Upvotes: 2

Related Questions