How to use Hadoop InputFormats In Apache Spark?

Question

I have a class ImageInputFormat in Hadoop which reads images from HDFS. How to use my InputFormat in Spark?

Here is my ImageInputFormat:

public class ImageInputFormat extends FileInputFormat {

    @Override
    public ImageRecordReader createRecordReader(InputSplit split, 
                  TaskAttemptContext context) throws IOException, InterruptedException {
        return new ImageRecordReader();
    }

    @Override
    protected boolean isSplitable(JobContext context, Path filename) {
        return false;
    }
}

Robert Metzger · Accepted Answer

The SparkContext has a method called hadoopFile. It accepts classes implementing the interface org.apache.hadoop.mapred.InputFormat

Its description says "Get an RDD for a Hadoop file with an arbitrary InputFormat".

Also have a look at the Spark Documentation.

How to use Hadoop InputFormats In Apache Spark?

Answers (2)

Related Questions