Reputation: 2831
How do you unit test mongo-hadoop jobs?
My attempt so far:
public class MapperTest {
MapDriver<Object, BSONObject, Text, IntWritable> d;
@Before
public void setUp() throws IOException {
WordMapper mapper = new WordMapper();
d = MapDriver.newMapDriver(mapper);
}
@Test
public void testMapper() throws IOException {
BSONObject doc = new BasicBSONObject("sentence", "Two words");
d.withInput(new Text("anykey"), doc );
d.withOutput(new Text("Two"), new IntWritable(1));
d.withOutput(new Text("words"), new IntWritable(1));
d.runTest();
}
}
Which produces this output:
No applicable class implementing Serialization in conf at io.serializations for class org.bson.BasicBSONObject
java.lang.IllegalStateException at org.apache.hadoop.mrunit.internal.io.Serialization.copy(Serialization.java:67) at org.apache.hadoop.mrunit.internal.io.Serialization.copy(Serialization.java:91) at org.apache.hadoop.mrunit.internal.io.Serialization.copyWithConf(Serialization.java:104) at org.apache.hadoop.mrunit.TestDriver.copy(TestDriver.java:608) at org.apache.hadoop.mrunit.TestDriver.copyPair(TestDriver.java:612) at org.apache.hadoop.mrunit.MapDriverBase.addInput(MapDriverBase.java:118) at org.apache.hadoop.mrunit.MapDriverBase.withInput(MapDriverBase.java:207) ...
Upvotes: 1
Views: 421
Reputation: 47
You need to set the serializer. Example : mapDriver.getConfiguration().setStrings("io.serializations", "org.apache.hadoop.io.serializer.WritableSerialization", MongoSerDe.class.getName());
MongoSerDe src: https://gist.github.com/lfrancke/01d1819a94f14da171e3
But I face error "org.bson.io.BasicOutputBuffer.pipe(Ljava/io/DataOutput;)I" while using this(MongoSerDe).
Upvotes: 1