Bob
Bob

Reputation: 1001

Reading and Writing CSV files in a hadoop application

I need to process custom csv files as input and write csv files back to the HDFS. Can I do this directly inside the map reduce class where the real takes places?

For processing my csv files, I am using opencsv library. I have seen some tutorials where they use inputformat and outputformat flags to specify java classes which handle the user-defined formats. Can someone please give advice on how to work with csv files?

I want to stick with what hadoop has to offer, otherwise my own implementation of input and output formats may make my processing slow.

Upvotes: 3

Views: 7370

Answers (1)

David Gruzman
David Gruzman

Reputation: 8088

The question is if you need multi-line csv or not.
If you do not need it - you can use vanilla TextInputFormat and TextOutputFormat and use opencsv inside you mapper to parse lines. For the output TextOutputFormat is also just fine
If you need multiline - there are some hacks you have to do to assemble the logical records. You can create you own input format for it, or do it inside mapper.

Upvotes: 4

Related Questions