Martin Peng
Martin Peng

Reputation: 87

Spark 2.4.1 can not read Avro file from HDFS

I have a simple code block to write then read dataframe as Avro format. As the Avro lib already built in Spark 2.4.x,

The Avro files writing went succeed and files are generated in HDFS. However AbstractMethodError exception is thrown when I read the files. Can anyone share me some light?

I used the Spark internal library by adding the package org.apache.spark:spark-avro_2.11:2.4.1 in my Zeppelin nodebook Spark interpreter.

My simple code block:

%pyspark

test_rows = [ Row(file_name = "test-guangzhou1", topic='camera1', timestamp=1, msg="Test1"),  Row(file_name = "test-guangzhou1", topic='camera1', timestamp=2, msg="Test2"), Row(file_name = "test-guangzhou3", topic='camera3', timestamp=3, msg="Test3"), Row(file_name = "test-guangzhou1", topic='camera1', timestamp=4, msg="Test4") ]

test_df = spark.createDataFrame(test_rows)

test_df.write.format("avro")
    .mode('overwrite').save("hdfs:///tmp/bag_parser279181359_3")

loaded_df =  spark.read.format("avro").load('hdfs:///tmp/bag_parser279181359_3')

loaded_df.show()

The error message I saw:

Py4JJavaError: An error occurred while calling o701.collectToPython.
: java.lang.AbstractMethodError
    at org.apache.spark.sql.execution.FileSourceScanExec.inputRDD$lzycompute(DataSourceScanExec.scala:337)
    at org.apache.spark.sql.execution.FileSourceScanExec.inputRDD(DataSourceScanExec.scala:331)
    at org.apache.spark.sql.execution.FileSourceScanExec.inputRDDs(DataSourceScanExec.scala:357)
    at org.apache.spark.sql.execution.WholeStageCodegenExec.doExecute(WholeStageCodegenExec.scala:627)
    at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:137)
    at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:133)
    at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:161)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
    at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:158)
    at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:133)
    at org.apache.spark.sql.execution.SparkPlan.getByteArrayRdd(SparkPlan.scala:289)
    at org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:381)
    at org.apache.spark.sql.execution.CollectLimitExec.executeCollect(limit.scala:38)
    at org.apache.spark.sql.Dataset$$anonfun$collectToPython$1.apply(Dataset.scala:3259)
    at org.apache.spark.sql.Dataset$$anonfun$collectToPython$1.apply(Dataset.scala:3256)
    at org.apache.spark.sql.Dataset$$anonfun$53.apply(Dataset.scala:3373)
    at org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:79)
    at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:144)
    at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:74)
    at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3367)
    at org.apache.spark.sql.Dataset.collectToPython(Dataset.scala:3256)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
    at py4j.Gateway.invoke(Gateway.java:282)
    at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
    at py4j.commands.CallCommand.execute(CallCommand.java:79)
    at py4j.GatewayConnection.run(GatewayConnection.java:238)
    at java.lang.Thread.run(Thread.java:748)

(<class 'py4j.protocol.Py4JJavaError'>, Py4JJavaError(u'An error occurred while calling o701.collectToPython.\n', JavaObject id=o702), <traceback object at 0x7fc031b5c878>)

Upvotes: 1

Views: 547

Answers (2)

Jonathan Kelly
Jonathan Kelly

Reputation: 1990

There is a similar but different question being asked here that pertains to using spark-avro on emr-5.28.0. It's not the same cause as what's being discussed here in this question (since this question was asked long before emr-5.28.0 was available), but it's similar enough that I figured I'd link to my answer for that one in case anybody stumbles upon this question due to the similar looking stacktrace and similar sounding question.

Upvotes: 0

Ram Ghadiyaram
Ram Ghadiyaram

Reputation: 29237

AbstractMethodError :

Thrown when an application tries to call an abstract method. Normally, this error is caught by the compiler; this error can only occur at run time if the definition of some class has incompatibly changed since the currently executing method was last compiled.

AFAIK you have to investigate on what versions you have used to compile and run.

Upvotes: 1

Related Questions