Reputation: 569
I have a 5 million entries in a Mongo DB that look like this:
{
"_id" : ObjectId("525facace4b0c1f5e78753ea"),
"productId" : null,
"name" : "example name",
"time" : ISODate("2013-10-17T09:23:56.131Z"),
"type" : "hover",
"url" : "www.example.com",
"userAgent" : "curl/7.24.0 (x86_64-apple-darwin12.0) libcurl/7.24.0 openssl/0.9.8r zlib/1.2.5"
}
I need to add to every entry a new field called device
which will have either the value desktop
or mobile
. That means, the goal would be to have the following kind of entries:
{
"_id" : ObjectId("525facace4b0c1f5e78753ea"),
"productId" : null,
"device" : "desktop",
"name" : "example name",
"time" : ISODate("2013-10-17T09:23:56.131Z"),
"type" : "hover",
"url" : "www.example.com",
"userAgent" : "curl/7.24.0 (x86_64-apple-darwin12.0) libcurl/7.24.0 openssl/0.9.8r zlib/1.2.5"
}
I am working with the MongoDB Java driver and so far I am doing the following:
DBObject query = new BasicDBObject();
query.put("device", new BasicDBObject("$exists", false)); //some entries already have such field
DBCursor cursor = resource.find(query);
cursor.addOption(Bytes.QUERYOPTION_NOTIMEOUT);
Iterator<DBObject> iterator = cursor.iterator();
int size = cursor.count();
And then I am iterating with a while(iterator.hasNext())
, doing an if-else with a huge regular expression I found out there, and depending of the result of such if-else I execute something like:
BasicDBObject newDocument = new BasicDBObject("$set", new BasicDBObject().append("device", "desktop")); //of "mobile", depending on the if-else
BasicDBObject searchQuery = new BasicDBObject("_id", id);
resource.getCollection(DatabaseConfiguration.WEBSITE_STATISTICS).update(searchQuery, newDocument);
However, due to the big amount of data (more than 5 million entries) this takes forever.
Is there a way of doing this with map reduce? So far I've only used MapReduce for counting, so I am not sure if it can be used for other matters.
Upvotes: 1
Views: 166
Reputation: 569
I found a way which was kind of tricky due to the whole configuration.
After installing Hadoop following this link, I did the following:
Created one class called MongoUpdate
, with a method run
where I set up all the configuration (like input and output URI) and create a job and configure all the settings. Among those, there is job.setMapperClass(MongoMapper.class)
Created MongoMapper
where I have the method map
which gets a BSONObject
. Here I perform the if-else condition and at the very end I do:
Text id = new Text(pValue.get("_id").toString()); pContext.write(id, new BSONWritable(pValue));
Class Main
whose main method simply instantiates a MongoUpdate
class and runs it run
method
Export the jar with all the libraries and type on the terminal: hadoop java NameOfTheJar.jar
Upvotes: 0