Raja Nagendra Kumar
Raja Nagendra Kumar

Reputation: 790

Sum of Substrings in mongodb

We have field(s) in mongodb which has numbers in string form, values such as "$123,00,89.00" or "1234$" etc

Is it possible to customize $sum accumulators in mongodb, so that, certain processing can be done at each field value while the sum is performed. Such as substring or reg-ex processing etc.

Upvotes: 1

Views: 331

Answers (1)

Blakes Seven
Blakes Seven

Reputation: 50406

The .mapReduce() method is what you need here. You cannot "cast" values in the aggregation framework from one "type" to another ( with the exception of "to string" or from Date to numeric ).

The JavaScript processing means that you can convert a string into a value for "summing". Somthing like this ( with a bit more work on a "safe" regex for the required "currency" values:

db.collection.mapReduce(
    function() {
        emit(null, this.amount.replace(/\$|,|\./g,"") / 100 );
    },
    function(key,values) {
        return Array.sum(values);
    },
    { "out": { "inline": 1 } }
)

Or with .group() which also uses JavaScript procesing, but is a bit more restrcitive in it's requirements:

db.collection.group({
    "key": null,
    "reduce": function( curr,result ) {
        result.total += curr.amount.replace(/\$|,|\./g,"") /100;
    },
    "initial": { "total": 0 }
});

So JavaScript processing is your only option as these sorts of operations are not supported in the aggregatation framework.

A number can be a string:

db.junk.aggregate([{ "$project": { "a": { "$substr": [ 1,0,1 ] } } }])
{ "_id" : ObjectId("55a458c567446a4351c804e5"), "a" : "1" }

And a Date can become a number:

db.junk.aggregate([{ "$project": { "a": { "$subtract": [ new Date(), new Date(0) ] } } }])
{ "_id" : ObjectId("55a458c567446a4351c804e5"), "a" : NumberLong("1436835669446") }

But there are no other operators to "cast" a "string" to "numeric" or even anthing to do a Regex replace as shown above.

If you want to use .aggregate() then you need to fix your data into a format that will support it, thus "numeric":

var bulk = db.collection.initializeOrderedBulkOp(),
    count = 0;

db.collection.find({ "amount": /\$|,\./g }).forEach(function(doc) {
    doc.amount = doc.amount.replace(/\$|,|\./g,"") /100;
    bulk.find({ "_id": doc._id }).updateOne({
        "$set": { "amount": doc.amount }
    });
    count++;

    // execute once in 1000 operations
    if ( count % 1000 == 0 ) {
        bulk.execute();
        bulk = db.collection.initializeOrderedBulkOp();
    }
});

// clean up queued operations
if ( count % 1000 != 0 )
    bulk.execute();

Then you can use .aggregate() on your "numeric" data:

db.collection.aggregate([
    { "$group": { "_id": null, "total": { "$sum": "$amount" } } }
])

Upvotes: 1

Related Questions