Reputation: 1001
I need to process custom csv files as input and write csv files back to the HDFS. Can I do this directly inside the map reduce class where the real takes places?
For processing my csv files, I am using opencsv library. I have seen some tutorials where they use inputformat and outputformat flags to specify java classes which handle the user-defined formats. Can someone please give advice on how to work with csv files?
I want to stick with what hadoop has to offer, otherwise my own implementation of input and output formats may make my processing slow.
Upvotes: 3
Views: 7370
Reputation: 8088
The question is if you need multi-line csv or not.
If you do not need it - you can use vanilla TextInputFormat and TextOutputFormat and use opencsv inside you mapper to parse lines. For the output TextOutputFormat is also just fine
If you need multiline - there are some hacks you have to do to assemble the logical records. You can create you own input format for it, or do it inside mapper.
Upvotes: 4