reading xml file with closed tag and many attributes in scala

Question

I am reading xml file in scala

..

This was already reported as issue and closed . https://github.com/databricks/spark-xml/pull/303

However i am not able to resolve this.

import org.apache.spark.sql.SparkSession
import com.databricks.spark.xml._
import org.apache.spark.sql.types.{StructType, StructField, DoubleType,StringType}
import org.apache.spark.sql.{Row, SaveMode}

object stack {
  def main(args: Array[String]) {
    val spark = SparkSession.builder.getOrCreate()

    val customSchema = StructType(Array(
      StructField("id", DoubleType, nullable = true),
      StructField("attr1", StringType, nullable = true),
      ...
      ...
    ))  
    val df = spark.read
        .option("rowTag", "tag2")
        .format("com.databricks.spark.xml")
        .schema(customSchema)
        .load("dummy.xml")

    import spark.sql
    import spark.implicits._

    df.createOrReplaceTempView("temp1")
    sql("SELECT * from temp1 limit 5").show()
  }
}

However df.show(5) displays no rows.

The resolution talks about using XmlInputFormat which i have not tried , if someone can guide then it will be helpful.

Similar type of solution works with nested xml file.


   .. 
   abc

I want to see the dataframe with values to show. and later i want to read many xml files and join them in a sql query.

maks · Accepted Answer

Thanks Mikhail for providing guidance however issue was very small. Sorry for not providing the actual xml file record earlier as the issue was in the attributes.

The attributes were starting with caps, when i made them in small then this my solution started working(ofcourse i printed the schema before using it as suggested by Mikhail)

reading xml file with closed tag and many attributes in scala

Answers (2)

Related Questions