Anand
Anand

Reputation: 11

How to read decimal value from parquet file using Morphline readAvroParquetFile and solar

Table with two columns(name string,salary decimal(10,3) and stored in parquet format in hive. While performing indexing using Morphline and solar getting the following exception:

ERROR morphline.MorphlineMapRunner: Unable to process file <parquet file>
java.lang.ClassCastException: org.apache.avro.generic.GenericData$Record cannot be cast to java.io.InputStream
        at org.kitesdk.morphline.stdio.AbstractParser.getAttachmentInputStream(AbstractParser.java:184)
        at org.kitesdk.morphline.stdio.AbstractParser.doProcess(AbstractParser.java:94)
        at org.kitesdk.morphline.base.AbstractCommand.process(AbstractCommand.java:161)
        at org.kitesdk.morphline.base.Connector.process(Connector.java:64)
        at org.kitesdk.morphline.base.AbstractCommand.doProcess(AbstractCommand.java:186)
        at org.kitesdk.morphline.stdlib.ConvertTimestampBuilder$ConvertTimestamp.doProcess(ConvertTimestampBuilder.java:161)
        at org.kitesdk.morphline.base.AbstractCommand.process(AbstractCommand.java:161)
        at org.kitesdk.morphline.base.Connector.process(Connector.java:64)
        at org.kitesdk.morphline.base.AbstractCommand.doProcess(AbstractCommand.java:186)
        at org.kitesdk.morphline.stdlib.GenerateUUIDBuilder$GenerateUUID.doProcess(GenerateUUIDBuilder.java:98)
        at org.kitesdk.morphline.base.AbstractCommand.process(AbstractCommand.java:161)
        at org.kitesdk.morphline.base.Connector.process(Connector.java:64)
        at org.kitesdk.morphline.avro.ExtractAvroPathsBuilder$ExtractAvroPaths.doProcess(ExtractAvroPathsBuilder.java:143)
        at org.kitesdk.morphline.base.AbstractCommand.process(AbstractCommand.java:161)
        at org.kitesdk.morphline.base.Connector.process(Connector.java:64)
        at org.kitesdk.morphline.hadoop.parquet.avro.ReadAvroParquetFileBuilder$ReadAvroParquetFile.extract(ReadAvroParquetFileBuilder.java:201)
        at org.kitesdk.morphline.hadoop.parquet.avro.ReadAvroParquetFileBuilder$ReadAvroParquetFile.doProcess(ReadAvroParquetFileBuilder.java:180)
        at org.kitesdk.morphline.base.AbstractCommand.process(AbstractCommand.java:161)
        at org.kitesdk.morphline.base.AbstractCommand.doProcess(AbstractCommand.java:186)
        at org.kitesdk.morphline.base.AbstractCommand.process(AbstractCommand.java:161)
        at org.apache.solr.hadoop.morphline.MorphlineMapRunner.map(MorphlineMapRunner.java:208)
        at org.apache.solr.hadoop.MapReduceIndexerTool.dryRun(MapReduceIndexerTool.java:1250)
        at org.apache.solr.hadoop.MapReduceIndexerTool.run(MapReduceIndexerTool.java:875)
        at org.apache.solr.hadoop.MapReduceIndexerTool.run(MapReduceIndexerTool.java:700)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
        at org.apache.solr.hadoop.MapReduceIndexerTool.main(MapReduceIndexerTool.java:687)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:497)
        at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Exception in thread "main" org.kitesdk.morphline.api.MorphlineRuntimeException: java.lang.ClassCastException: org.apache.avro.generic.GenericData$Record cannot be cast to java.io.InputStream
        at org.kitesdk.morphline.base.FaultTolerance.handleException(FaultTolerance.java:73)
        at org.apache.solr.hadoop.morphline.MorphlineMapRunner.map(MorphlineMapRunner.java:220)
        at org.apache.solr.hadoop.MapReduceIndexerTool.dryRun(MapReduceIndexerTool.java:1250)
        at org.apache.solr.hadoop.MapReduceIndexerTool.run(MapReduceIndexerTool.java:875)
        at org.apache.solr.hadoop.MapReduceIndexerTool.run(MapReduceIndexerTool.java:700)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
        at org.apache.solr.hadoop.MapReduceIndexerTool.main(MapReduceIndexerTool.java:687)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:497)
        at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: java.lang.ClassCastException: org.apache.avro.generic.GenericData$Record cannot be cast to java.io.InputStream
        at org.kitesdk.morphline.stdio.AbstractParser.getAttachmentInputStream(AbstractParser.java:184)
        at org.kitesdk.morphline.stdio.AbstractParser.doProcess(AbstractParser.java:94)
        at org.kitesdk.morphline.base.AbstractCommand.process(AbstractCommand.java:161)
        at org.kitesdk.morphline.base.Connector.process(Connector.java:64)
        at org.kitesdk.morphline.base.AbstractCommand.doProcess(AbstractCommand.java:186)
        at org.kitesdk.morphline.stdlib.ConvertTimestampBuilder$ConvertTimestamp.doProcess(ConvertTimestampBuilder.java:161)
        at org.kitesdk.morphline.base.AbstractCommand.process(AbstractCommand.java:161)
        at org.kitesdk.morphline.base.Connector.process(Connector.java:64)
        at org.kitesdk.morphline.base.AbstractCommand.doProcess(AbstractCommand.java:186)
        at org.kitesdk.morphline.stdlib.GenerateUUIDBuilder$GenerateUUID.doProcess(GenerateUUIDBuilder.java:98)
        at org.kitesdk.morphline.base.AbstractCommand.process(AbstractCommand.java:161)
        at org.kitesdk.morphline.base.Connector.process(Connector.java:64)
        at org.kitesdk.morphline.avro.ExtractAvroPathsBuilder$ExtractAvroPaths.doProcess(ExtractAvroPathsBuilder.java:143)
        at org.kitesdk.morphline.base.AbstractCommand.process(AbstractCommand.java:161)
        at org.kitesdk.morphline.base.Connector.process(Connector.java:64)
        at org.kitesdk.morphline.hadoop.parquet.avro.ReadAvroParquetFileBuilder$ReadAvroParquetFile.extract(ReadAvroParquetFileBuilder.java:201)
        at org.kitesdk.morphline.hadoop.parquet.avro.ReadAvroParquetFileBuilder$ReadAvroParquetFile.doProcess(ReadAvroParquetFileBuilder.java:180)
        at org.kitesdk.morphline.base.AbstractCommand.process(AbstractCommand.java:161)
        at org.kitesdk.morphline.base.AbstractCommand.doProcess(AbstractCommand.java:186)
        at org.kitesdk.morphline.base.AbstractCommand.process(AbstractCommand.java:161)
        at org.apache.solr.hadoop.morphline.MorphlineMapRunner.map(MorphlineMapRunner.java:208)
        ... 11 more

following content from Morphline file:

{
  readAvroParquetFile {
   readerSchemaString:"""{"type":"record","name":"employee","fields":[
     {"name":"name","type":["string","null"],"default":""},
     {"name": "salary","type":

["bytes","null"],"logicalType":"decimal","precision":10,"scale":4,"default":0 }
   ]}"""
  }
}

Any help on how to index parquet file of a table contains decimal column using Morphline and solar.

Upvotes: 1

Views: 433

Answers (1)

Wolfgang Hoschek
Wolfgang Hoschek

Reputation: 1

Per http://kitesdk.org/docs/current/morphlines/morphlines-reference-guide.html#readAvroParquetFile: "The morphline record input field file_upload_url must contain the HDFS Path of the Parquet file to read. (This field is already provided out of the box with MapReduceIndexerTool)."

Upvotes: 0

Related Questions