Eddy
Eddy

Reputation: 1812

Writing Hadoop reduce output to Elasticsearch

I'm having a bit of trouble understanding how to write the output of a simple Hadoop back into Elasticsearch.

Job is configured as:

job.setOutputFormatClass(EsOutputFormat.class);
job.setOutputKeyClass(NullWritable.class);
job.setOutputValueClass(MapWritable.class);

Reducer does:

final DoubleWritable average = new DoubleWritable(sum / size);

final MapWritable output = new MapWritable();
output.put(key, average);
context.write(NullWritable.get(), output);

Yet I get this inexplicable (to me) exception:

14/08/15 16:59:54 INFO mapreduce.Job: Task Id : attempt_1408106733881_0013_r_000000_2, Status : FAILED Error: 
  org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: 
  [org.elasticsearch.hadoop.serialization.field.MapWritableFieldExtractor@5796fabe] cannot extract value from object [org.apache.hadoop.io.MapWritable@dcdb8e97]    
    at org.elasticsearch.hadoop.serialization.bulk.TemplatedBulk$FieldWriter.write(TemplatedBulk.java:49)   
    at org.elasticsearch.hadoop.serialization.bulk.TemplatedBulk.writeTemplate(TemplatedBulk.java:101)  
    at org.elasticsearch.hadoop.serialization.bulk.TemplatedBulk.write(TemplatedBulk.java:77)   
    at org.elasticsearch.hadoop.rest.RestRepository.writeToIndex(RestRepository.java:130)   
    at org.elasticsearch.hadoop.mr.EsOutputFormat$EsRecordWriter.write(EsOutputFormat.java:161)     
    at org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.write(ReduceTask.java:558)

at the context.write() call.

I'm a bit baffled, any idea?

Upvotes: 2

Views: 1213

Answers (1)

Eddy
Eddy

Reputation: 1812

Turned out I had made a mistake in the Job configuration. I had added the following:

configuration.set("es.mapping.id", "_id");

without actually adding the _id field in the outgoing Mapwritable; this caused ES to throw the exception.

It would be useful if the MapWritableFleldExtractor logged on which field it failed.

Upvotes: 1

Related Questions