Sindu_
Sindu_

Reputation: 1455

How to create a custom schema to read XML in Scala

I an trying to define a custom schema for the following XML using spark and scala.

<CONTAINER>
    <TABLE attrname="Wood">
        <ROWDATA>
            <ROW Weight="55" Length="11" Unit="5"/>
        </ROWDATA>
    </TABLE>
    <TABLE attrname="Metal">
        <ROWDATA>
            <ROW Weight="66" Length="23" Unit="0"/>
            <ROW Weight="44" Length="22" Unit="0"/>
            <ROW Weight="33" Length="21" Unit="1"/>
        </ROWDATA>
    <TABLE attrname="Plastic">
        <ROWDATA>
            <ROW Weight="33" Length="11" Unit="0"/>
        </ROWDATA>
    </TABLE>
<CONTAINER>

This is the code which I have tried but it does not give me any output if I print the data frame. I also need the attribute name's in the data frame. Would really appreciate some help on specifying the schema correctly.

    val xmlDFF = session.read
    .option("rootTag", "CONTAINER")
     .option("rowTag", "TABLE")
     .schema(getContainderSchema)
      .xml(filePath)


def getContainderSchema: StructType = {

     val row = new StructType()
       .add("_Weight", StringType)
       .add("_Length", StringType)
       .add("_Unit", StringType)

     val rowdata = new StructType()
       .add("ROWDATA", ArrayType(row))
}

Upvotes: 0

Views: 364

Answers (1)

User9123
User9123

Reputation: 1733

You need to add "TABLE" type:

  val rowType = new StructType()
    .add("_Weight", StringType)
    .add("_Length", StringType)
    .add("_Unit", StringType)

  val rowDataType = new StructType()
    .add("ROW", ArrayType(rowType))

  val tableType = new StructType()
    .add("_attrname", StringType)
    .add("ROWDATA", rowDataType)

and use it:

.schema(tableType)

Upvotes: 1

Related Questions