Reputation: 1
I'm studying MapReduce. I face to some problem.
My data is ...(in example.)
<Doc>
<ID> No-001 </ID>
<Value> This is 001 Value. </Value>
</Doc>
<Doc>
<ID> No-002 </ID>
<Value> This is 002 Value. </Value>
</Doc>
...
I need to change above text to ...
This is 001 Value. No-001
This is 002 Value. No-002
...
I want to send multiple line between and to value of Mapper in MapReduce. Key is anything. I have searched some example, but I can't this problem.
To solve the problem, I think that I must handle InputFormat.
Please answer the problem.
Upvotes: 0
Views: 142
Reputation: 2345
You should use Mahout XMLinputFormat class for XML-files' parsing. It allows you to configure your Driver code like this:
conf.set("xmlinput.start", "<Doc>");
conf.set("xmlinput.end", "</Doc>");
job.setInputFormatClass(XmlInputFormat.class);
And then inside your mapper you may process your XML-content with any parser you like. There is a good tutorial for XmlInputFormat class.
Upvotes: 1