pyspark: org.xml.sax.SAXParseException Current config of the parser doesn't allow a maxOccurs attribute value to be set greater than the value 5,000

Question

I am trying to parse xml files with XSD using spark-xml library in pyspark. Below is the code :

xml_df = spark.read.format("com.databricks.spark.xml") \
    .option("rootTag", "Document") \
    .option("rowTag", "row01") \
    .option("rowValidationXSDPath","auth.011.001.02_ABC_1.1.0.xsd") \
    .load("/mnt/bronze/ABC-3.xml")

I am getting error as org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 3) (10.0.1.4 executor driver): java.util.concurrent.ExecutionException: org.xml.sax.SAXParseException; systemId: file:/local_disk0/auth.011.001.02_ABC_1.1.0.xsd; lineNumber: 5846; columnNumber: 99; Current configuration of the parser doesn't allow a maxOccurs attribute value to be set greater than the value 5,000.

I have looked for the ways to setup jdk.xml.maxOccurLimit=0 in databricks cluster but didn't find any.

Any help on solving this error will be highly appreciated.

pyspark: org.xml.sax.SAXParseException Current config of the parser doesn't allow a maxOccurs attribute value to be set greater than the value 5,000

Answers (1)

Related Questions

pyspark: org.xml.sax.SAXParseException Current config of the parser doesn&#39;t allow a maxOccurs attribute value to be set greater than the value 5,000

Answers (1)

Related Questions

pyspark: org.xml.sax.SAXParseException Current config of the parser doesn't allow a maxOccurs attribute value to be set greater than the value 5,000