Drill failing to read most of the columns in Parquet generated by Spark

Question

I am running Drill 1.15 in distributed mode on top of datanodes only (3 nodes with 32GB memory each). I am trying to read parquet file generated from Spark job in HDFs.

Generated file is being read in spark, just fine but when reading in Drill it doesn't seem to work for columns except a few.

org.apache.drill.common.exceptions.UserRemoteException: DATA_READ ERROR: Exception occurred while reading from disk. File: [file_name].parquet Column: Line Row Group Start: 111831 File: [file_name].parquet Column: Line Row Group Start: 111831 Fragment 0:0 [Error Id: [Error_id] on [host]:31010]

In drill config for dfs, i have default config for parquet format.

I am trying to run a simple query :

select * from dfs.`/hdfs/path/to/parquet/file.parquet`

File size if also in 10s of MBs not alot.

I am using Spark 2.3 version to generate the parquet file with 1.15 version of Drill.

Is there any config i am missing or some other point?

Drill failing to read most of the columns in Parquet generated by Spark

Answers (1)

Related Questions