Reputation: 1455
I an trying to define a custom schema for the following XML using spark and scala.
<CONTAINER>
<TABLE attrname="Wood">
<ROWDATA>
<ROW Weight="55" Length="11" Unit="5"/>
</ROWDATA>
</TABLE>
<TABLE attrname="Metal">
<ROWDATA>
<ROW Weight="66" Length="23" Unit="0"/>
<ROW Weight="44" Length="22" Unit="0"/>
<ROW Weight="33" Length="21" Unit="1"/>
</ROWDATA>
<TABLE attrname="Plastic">
<ROWDATA>
<ROW Weight="33" Length="11" Unit="0"/>
</ROWDATA>
</TABLE>
<CONTAINER>
This is the code which I have tried but it does not give me any output if I print the data frame. I also need the attribute name's in the data frame. Would really appreciate some help on specifying the schema correctly.
val xmlDFF = session.read
.option("rootTag", "CONTAINER")
.option("rowTag", "TABLE")
.schema(getContainderSchema)
.xml(filePath)
def getContainderSchema: StructType = {
val row = new StructType()
.add("_Weight", StringType)
.add("_Length", StringType)
.add("_Unit", StringType)
val rowdata = new StructType()
.add("ROWDATA", ArrayType(row))
}
Upvotes: 0
Views: 364
Reputation: 1733
You need to add "TABLE" type:
val rowType = new StructType()
.add("_Weight", StringType)
.add("_Length", StringType)
.add("_Unit", StringType)
val rowDataType = new StructType()
.add("ROW", ArrayType(rowType))
val tableType = new StructType()
.add("_attrname", StringType)
.add("ROWDATA", rowDataType)
and use it:
.schema(tableType)
Upvotes: 1