Reputation: 1455
I have the following XML and I am using scala and spark to read it.
<TABLES>
<TABLE attrname="Red">
<ROWDATA>
<ROW Type="solid" track="0" Unit="0"/>
</ROWDATA>
</TABLE>
<TABLE attrname="Blue">
<ROWDATA>
<ROW Type="light" track="0" Unit="0"/>
<ROW Type="solid" track="0" Unit="0"/>
<ROW Type="solid" track="0" Unit="0"/>
</ROWDATA>
</TABLE>
This is the output I need:
I have the below code but it reads the unit,track,type as columns but I need them in rows. What would be the best way of doing this?
val xmlDFF = session.read
.option("rowTag", "TABLE")
.xml(filePath)
.select(explode_outer(col("ROWDATA.ROW").as("row")),col("_attrname").as("attrname"))
.select(col("attrname")
,col("col._unit")
,col("col._track")
,col("col._type"))
Upvotes: 0
Views: 307
Reputation: 10382
Check below code.
val xmlDFF = session.read
.option("rowTag", "TABLES")
.xml(filePath)
.withColumn("TABLE",explode_outer($"TABLE"))
.select(
row_number().over(Window.partitionBy(lit(1)).orderBy(lit(1))).as("obj_id"),
$"TABLE.*",
explode_outer($"TABLE.ROWDATA.ROW").as("row")
)
.select($"obj_id",$"_attrname",explode_outer(array(
struct(
lit("Type").as("Name"),
$"row._Type".as("Value")
),
struct(
lit("track").as("Name"),
$"row._track".as("Value")
),
struct(
lit("Unit").as("Name"),
$"row._Unit".as("Value")
)
)
).as("row"))
.select(
$"obj_id",
$"_attrname".as("Type"),
$"row.*"
)
.orderBy($"obj_id")
.show(false)
Output
+------+----+-----+-----+
|obj_id|Type|Name |Value|
+------+----+-----+-----+
|1 |Red |track|0 |
|1 |Red |Type |solid|
|1 |Red |Unit |0 |
|2 |Blue|Type |light|
|2 |Blue|Unit |0 |
|2 |Blue|track|0 |
|3 |Blue|Unit |0 |
|3 |Blue|Type |solid|
|3 |Blue|track|0 |
|4 |Blue|Type |solid|
|4 |Blue|track|0 |
|4 |Blue|Unit |0 |
+------+----+-----+-----+
Upvotes: 2