raghuram gururajan
raghuram gururajan

Reputation: 563

Reading a excel file in hadoop map reduce

I am trying to read a Excel file containing some data for aggregation in hadoop.The map reduce program seems to be working fine but the output produce is in a non readable format.Do I need to use any special InputFormat reader for Excel file in Hadoop Map Reduce ?.My configuration is as below

   Configuration conf=getConf();
Job job=new Job(conf,"LatestWordCount");
job.setJarByClass(FlightDetailsCount.class);
Path input=new Path(args[0]);
Path output=new Path(args[1]);
FileInputFormat.setInputPaths(job, input);
FileOutputFormat.setOutputPath(job, output);
job.setMapperClass(MapClass.class);
job.setReducerClass(ReduceClass.class);
//job.setCombinerClass(ReduceClass.class);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(Text.class);
//job.setOutputKeyClass(Text.class);
//job.setOutputValueClass(Text.class);
System.exit(job.waitForCompletion(true)?0:1);
return 0;

The output produce looks like this �KW ��O�A��]n��Ε��r3�\n"���p�饚6W�jJ���9W�f=��9ml��dR�y/Ք��7�^�i ��M*Ք�^nz��l��^�)��妗j�(��dRͱ/7�TS*��M//7�TS��&�jZ��o��TSR�7�@�)�o��TӺ��5{%��+��ۆ�w6-��=�e�_}m�)~��ʅ��ژ���: #�j�]��u����>

Upvotes: 2

Views: 6700

Answers (3)

Jörn Franke
Jörn Franke

Reputation: 186

You can also use the HadoopOffice library, which allows you to read/write Excel with Hadoop and Spark. It is available on Maven Central and Spark packages.

https://github.com/ZuInnoTe/hadoopoffice/wiki

Upvotes: 0

Gyanendra Dwivedi
Gyanendra Dwivedi

Reputation: 5538

I know it is a bit late, but now someone has already created excel input format as an standard solution for this kind of problem. Read this -https://sreejithrpillai.wordpress.com/2014/11/06/excel-inputformat-for-hadoop-mapreduce/

A github project is there with codebase.

Look here - https://github.com/sreejithpillai/ExcelRecordReaderMapReduce/

Upvotes: 0

jkovacs
jkovacs

Reputation: 3530

I don't know if someone actually developed a custom InputFormat for MS Excel files (I doubt it and quick research turns up nothing), but you most certainly can not read an Excel file using the TextInputFormat. XSL files are binary.

Solution: Export your Excel file to CSV or TSV, then you'll be able to load them using the TextInputFormat.

Upvotes: 5

Related Questions