Reputation: 286
I have to deal with inconsistent documents in MongoDB collection where some field might be numeric or might have NaN value. I need to update it with $inc. But looks like if it have NaN value $inc have no effect. What options available for atomic document update?
Upvotes: 1
Views: 5199
Reputation: 151122
Well this seems to lead to two logical conclusions. The first being that if there are NaN
values present in a field then how to identify them? Consider the following sample, let's call the collection "nantest"
{ "_id" : ObjectId("54055993b145d1c015a1ad41"), "n" : NaN }
{ "_id" : ObjectId("540559e8b145d1c015a1ad42"), "n" : Infinity }
{ "_id" : ObjectId("54055b59b145d1c015a1ad43"), "n" : 1 }
{ "_id" : ObjectId("54055ea1b145d1c015a1ad44"), "n" : -Infinity }
So both NaN
and Infinity
or -Infinity
are representative of "non-numbers" that have somehow emerged in your data. The best way to find these documents where that field is set that way is to use the $where
operator for a JavaScript evaluated query condition. Not efficient but it what you have got:
db.nantest.find({
"$where": "return isNaN(this.n) || Math.abs(this.n) == Infinity"
})
So that gives a way of finding the data that is the problem. From here you could jump through hoops and decide that where this was encountered you would just reset it to 0 before incrementing, essentially issuing two update statements where the first one would not match a document to update if the value was correct:
db.nantest.update(
{ "$where": "return isNaN(this.n) || Math.abs(this.n) == Infinity" },
{ "$set": { "n": 0 } }
);
db.nantest.update(
{ },
{ "$inc": { "n": 1 } }
);
But really when you look at that, why would you want to patch your code to cater for this when you can just patch the data. So the logical thing to finally conclude is just update all the Nan
and possibly Infinity
values to a standard reset number in one statement:
db.nantest.update(
{ "$where": "return isNaN(this.n) || Math.abs(this.n) == Infinity" },
{ "$set": { "n": 0 } },
{ "multi": true }
);
Run one statement and then you don't have to change your code and simply process increments as you should normally expect.
If your trouble is knowing which fields have the Nan
values present in order to invoke updates to fix them, then consider something along the lines of this mapReduce process to inspect the fields:
db.nantest.mapReduce(
function () {
var doc = this;
delete doc._id;
Object.keys( doc ).forEach(function(key) {
if ( isNaN( doc[key] ) || Math.abs(doc[key]) == Infinity )
emit( key, 1 );
});
},
function (key,values) {
return Array.sum( values );
},
{ "out": { "inline": 1 } }
)
For which you might need to add some complexity to for more nested documents, but this tells you which fields can possibly contain the errant values so you can construct update statements to fix them.
It would seem that rather than bending your code to suit this you "should be" doing:
Find the source that is causing the numbers to appear and fix that.
Identify the field or fields that contain these values
Process one off update statement to fix the data all at once.
Minimal messing with your code and it both fixes the "source" of the problem and the "result" of that data corruption that was introduced.
Upvotes: 4