user3454507
user3454507

Reputation: 1

How to send multiple line text to key/value in MapReduce, not single line

I'm studying MapReduce. I face to some problem.

My data is ...(in example.)

<Doc>
<ID> No-001 </ID>
<Value> This is 001 Value. </Value>
</Doc>

<Doc>
<ID> No-002 </ID>
<Value> This is 002 Value. </Value>
</Doc>
...

I need to change above text to ...

This is 001 Value. No-001
This is 002 Value. No-002
...

I want to send multiple line between and to value of Mapper in MapReduce. Key is anything. I have searched some example, but I can't this problem.

To solve the problem, I think that I must handle InputFormat.

Please answer the problem.

Upvotes: 0

Views: 142

Answers (1)

Viacheslav Rodionov
Viacheslav Rodionov

Reputation: 2345

You should use Mahout XMLinputFormat class for XML-files' parsing. It allows you to configure your Driver code like this:

conf.set("xmlinput.start", "<Doc>");
conf.set("xmlinput.end", "</Doc>");
job.setInputFormatClass(XmlInputFormat.class);

And then inside your mapper you may process your XML-content with any parser you like. There is a good tutorial for XmlInputFormat class.

Upvotes: 1

Related Questions