Miguel
Miguel

Reputation: 181

Pyspark Avro write error SQLConf$LegacyBehaviorPolicy

I'm trying to write data to PySpark, but it gives me an error:

My code is:

spark = pyspark.sql.SparkSession.builder\
    .master("local[*]")\
    .appName("MiAplicacionSpark")\
    .config("spark.jars.packages", "org.apache.spark:spark-avro_2.12:3.5.3,org.apache.hadoop:hadoop-azure:3.2.0,com.microsoft.azure:azure-storage:8.6.0") \
    .getOrCreate()

...

df = spark.read \
    .format("jdbc") \
    .option("url", jdbc_url) \
    .option("dbtable", query) \
    .option("user", username) \
    .option("password", password) \
    .option("fetchsize", "20000") \
    .load()


df.write.format("avro").mode("overwrite").save(file_path + "_magic.avro")

the error is:

java.lang.NoClassDefFoundError: org/apache/spark/sql/internal/SQLConf$LegacyBehaviorPolicy$ at org.apache.spark.sql.avro.AvroOutputWriter.(AvroOutputWriter.scala:47) at org.apache.spark.sql.avro.AvroOutputWriterFactory.newInstance(AvroOutputWriterFactory.scala:43) at org.apache.spark.sql.execution.datasources.SingleDirectoryDataWriter.newOutputWriter(FileFormatDataWriter.scala:161) at org.apache.spark.sql.execution.datasources.SingleDirectoryDataWriter.(FileFormatDataWriter.scala:146) at org.apache.spark.sql.execution.datasources.FileFormatWriter$.executeTask(FileFormatWriter.scala:389) at org.apache.spark.sql.execution.datasources.WriteFilesExec.$anonfun$doExecuteWrite$1(WriteFiles.scala:100)

My version of spark in pyspark is 3.5.3.

I can't find what this error could be.

Thanks in advance


UPDATE:

I have downgrade to pyspark 3.3.4 and now the error is

Caused by: java.lang.NoSuchMethodError: org.apache.spark.sql.internal.SQLConf$.LEGACY_AVRO_REBASE_MODE_IN_WRITE()Lorg/apache/spark/internal/config/ConfigEntry;

.config("spark.jars.packages", "org.apache.spark:spark-avro_2.12:3.3.4,org.apache.hadoop:hadoop-azure:3.3.4,com.microsoft.azure:azure-storage:8.6.0") \

+info

Java version: 11.0.16.1 Hadoop version: None Spark version: 3.3.4 Loaded Jars: file:///C:/Users/losjfg/.ivy2/jars/org.apache.hadoop_hadoop-azure-3.3.4.jar,file:///C:/Users/losjfg/.ivy2/jars/com.microsoft.azure_azure-storage-8.6.0.jar,file:///C:/Users/losjfg/.ivy2/jars/org.apache.httpcomponents_httpclient-4.5.13.jar,file:///C:/Users/losjfg/.ivy2/jars/org.apache.hadoop.thirdparty_hadoop-shaded-guava-1.1.1.jar,file:///C:/Users/losjfg/.ivy2/jars/org.eclipse.jetty_jetty-util-ajax-9.4.43.v20210629.jar,file:///C:/Users/losjfg/.ivy2/jars/org.codehaus.jackson_jackson-mapper-asl-1.9.13.jar,file:///C:/Users/losjfg/.ivy2/jars/org.codehaus.jackson_jackson-core-asl-1.9.13.jar,file:///C:/Users/losjfg/.ivy2/jars/org.wildfly.openssl_wildfly-openssl-1.0.7.Final.jar,file:///C:/Users/losjfg/.ivy2/jars/org.apache.httpcomponents_httpcore-4.4.13.jar,file:///C:/Users/losjfg/.ivy2/jars/commons-logging_commons-logging-1.1.3.jar,file:///C:/Users/losjfg/.ivy2/jars/commons-codec_commons-codec-1.15.jar,file:///C:/Users/losjfg/.ivy2/jars/org.eclipse.jetty_jetty-util-9.4.43.v20210629.jar,file:///C:/Users/losjfg/.ivy2/jars/com.fasterxml.jackson.core_jackson-core-2.9.4.jar,file:///C:/Users/losjfg/.ivy2/jars/org.slf4j_slf4j-api-1.7.12.jar,file:///C:/Users/losjfg/.ivy2/jars/org.apache.commons_commons-lang3-3.4.jar,file:///C:/Users/losjfg/.ivy2/jars/com.microsoft.azure_azure-keyvault-core-1.0.0.jar,file:///C:/Users/losjfg/.ivy2/jars/com.google.guava_guava-20.0.jar

Upvotes: 0

Views: 133

Answers (1)

Srinimf
Srinimf

Reputation: 56

It is a compatibility issue between Avro and Spark. Remove from the spark.jars.packages, org.apache.spark:spark-avro_2.12:3.5.3 since in the Spark latest vesrion already included. Secondly use latest Hadoop-Azure version "org.apache.hadoop:hadoop-azure:3.3.4". Refresh your spark session, it will resolve the issue.

Upvotes: 0

Related Questions