Aman Sehgal
Aman Sehgal

Reputation: 556

Load XML file to dataframe in PySpark using DBR 7.3.x+

I'm trying to load an XML file in to dataframe using PySpark in databricks notebook.

df = spark.read.format("xml").options(
    rowTag="product" , mode="PERMISSIVE", columnNameOfCorruptRecord="error_record"
).load(filePath)

On doing so, I get following error:

Could not initialize class com.databricks.spark.xml.util.PermissiveMode$

Databricks runtime version : 7.3 LTS Spark version : 3.0.1 Scala version : 2.12

The same code block runs perfectly fine in DBR 6.4 Spark 2.4.5, Scala 2.11

Upvotes: 1

Views: 2008

Answers (1)

Alex Ott
Alex Ott

Reputation: 87299

You need to upgrade version of spark_xml library to a version compiled for Scala 2.12, because the version that works for DBR 6.4 isn't compatible with new Scala version. So, instead of spark-xml_2.11 you need to use spark-xml_2.12.

P.S. I just checked with DBR 7.3 LTS & com.databricks:spark-xml_2.12:0.11.0 - works just fine.

Upvotes: 2

Related Questions