How to read ORC file in hadoop streaming?

Question

I'd like to read ORC files in my mapreduce on Python. I try to run it:

hadoop jar /usr/lib/hadoop/lib/hadoop-streaming-2.6.0.2.2.6.0-2800.jar 
-file /hdfs/price/mymapper.py 
-mapper '/usr/local/anaconda/bin/python mymapper.py' 
-file /hdfs/price/myreducer.py 
-reducer '/usr/local/anaconda/bin/python myreducer.py' 
-input /user/hive/orcfiles/* 
-libjars /usr/hdp/2.2.6.0-2800/hive/lib/hive-exec.jar 
-inputformat org.apache.hadoop.hive.ql.io.orc.OrcInputFormat 
-numReduceTasks 1 
-output /user/hive/output

But I get error:

-inputformat : class not found : org.apache.hadoop.hive.ql.io.orc.OrcInputFormat

I found a similar question OrcNewInputformat as a inputformat for hadoop streaming but answer is not clear

Please, give me example how to read ORC files correctly in hadoop streaming.

How to read ORC file in hadoop streaming?

Answers (1)

Related Questions