Spark not able to read Erasure coded parquet files in Hadoop 3

Question

I have built Hadoop 3.2.0 on a RHEL 6.7 linux box using Intel ISA-L library. Also, enabled the native library support in the Hadoop installation.

I have copied some parquet format files on this test cluster using "haoop fs - copyFromLocal" with RS-6-3-1024k encoder policy. However, when I try to read these parquet files using Spark 2.4.3, I am getting and exception as below.

Caused by: org.apache.hadoop.hdfs.BlockMissingException: Could not obtain block: BP-330495766-11.57.1.147-1565544876888:blk_-9223372036854684144_16436 file=/mydata/testenv/app_data/DATE=20190812/part-r-00075-afc16916-7d0c-42bb-bb20-d0240b4431d8.snappy.parquet
          at org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:984)
at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:642)
at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:882)
at org.apache.hadoop.hdfs.DFSInputStream.read((DFSInputStream.java:934)
at org.apache.hadoop.hdfs.DFSInputStream.read((DFSInputStream.java:735)
at java.io.FilterInputStream.read(FilterInputStream.java:83)
at org.apache.parquet.io.DelegatingSeekableInputStream.read(DelegatingSeekableInputStream.java:61)
.....

Please note that I am able to copy these files from HDFS to local using hadoop command, HDFS web interface etc without any issues. The hadoop fsck is saying the path where the files are copied are Healthy as well.

Note: I am running the Hadoop cluster on RHEL 7.5, though I have built the libraries on RHEL 6.7. However, when I run the "hadoop checknative" command I don't see any errors. I do see that the ISA-L library is enabled correctly, i.e. I do see the "true" text in the output next to it.

Spark not able to read Erasure coded parquet files in Hadoop 3

Answers (1)

Related Questions