Sindu_
Sindu_

Reputation: 1455

How to pass through a custom schema depending on different XML attribute names in Scala

I need to define a custom schema for the XML below. Each TABLE attribute has different columns so I want to define a different custom schema for each attribute.

<CASE>
    <TABLE attrname="Wood">
        <ROWDATA>
            <ROW Weight="55" Length="11" color="Black"/>
        </ROWDATA>
    </TABLE>
    <TABLE attrname="Metal">
        <ROWDATA>
            <ROW Type ="AL" Weight="66" Length="23" Unit="0" />
            <ROW Type ="AL" Weight="44" Length="22" Unit="0"/>
            <ROW Type ="AL" Weight="33" Length="21" Unit="1"/>
        </ROWDATA>
    <TABLE attrname="Plastic">
        <ROWDATA>
            <ROW color="Blue" Grade="A" Price="112"/>
        </ROWDATA>
    </TABLE>
<CASE>

This can be used to read the XML, but is there a way to read it after checking for the attribute name? For example if the table attribute name is "Plastic" then I want to use the following schema to read the XML.

val xmlDFF = session.read
.option("rootTag", "CASE")
.option("rowTag", "TABLE")
.schema(getPlasticSchema)
.xml(filePath)

def getPlasticSchema: StructType = {


    val rowType = new StructType()
       .add("_color", StringType)
       .add("_Grade", StringType)
       .add("_Price", StringType)

     val rowDataType = new StructType()
       .add("ROW", ArrayType(rowType))

     val tableTypePlastic = new StructType()
       .add("_attrname", StringType)
       .add("ROWDATA", rowDataType)
       
    tableTypePlastic
       
}

Upvotes: 0

Views: 36

Answers (1)

User9123
User9123

Reputation: 1733

It doesn't look like It can be filtered during reading. But you can do it after:

.xml(filePath)
.filter("_attrname = 'Plastic'")

Upvotes: 1

Related Questions