Sindu_
Sindu_

Reputation: 1455

Read XML in Spark and Scala

I have the following XML and I am using scala and spark to read it.

<TABLES>
    <TABLE attrname="Red">
        <ROWDATA>
            <ROW Type="solid" track="0" Unit="0"/>
        </ROWDATA>
    </TABLE>
    <TABLE attrname="Blue">
        <ROWDATA>
            <ROW Type="light" track="0" Unit="0"/>
            <ROW Type="solid" track="0" Unit="0"/>
            <ROW Type="solid" track="0" Unit="0"/>
        </ROWDATA>
    </TABLE>

This is the output I need:

enter image description here

I have the below code but it reads the unit,track,type as columns but I need them in rows. What would be the best way of doing this?

val xmlDFF = session.read
      .option("rowTag", "TABLE")
      .xml(filePath)
      .select(explode_outer(col("ROWDATA.ROW").as("row")),col("_attrname").as("attrname"))
      .select(col("attrname")
        ,col("col._unit")
        ,col("col._track")
        ,col("col._type"))
 

Upvotes: 0

Views: 307

Answers (1)

s.polam
s.polam

Reputation: 10382

Check below code.

val xmlDFF = session.read
    .option("rowTag", "TABLES")
    .xml(filePath)
     .withColumn("TABLE",explode_outer($"TABLE"))
    .select(
      row_number().over(Window.partitionBy(lit(1)).orderBy(lit(1))).as("obj_id"),
      $"TABLE.*",
      explode_outer($"TABLE.ROWDATA.ROW").as("row")
    )
    .select($"obj_id",$"_attrname",explode_outer(array(
      struct(
        lit("Type").as("Name"),
        $"row._Type".as("Value")
      ),
      struct(
        lit("track").as("Name"),
        $"row._track".as("Value")
      ),
      struct(
        lit("Unit").as("Name"),
        $"row._Unit".as("Value")
      )
    )
    ).as("row"))
    .select(
      $"obj_id",
      $"_attrname".as("Type"),
      $"row.*"
    )
    .orderBy($"obj_id")
    .show(false)

Output

+------+----+-----+-----+
|obj_id|Type|Name |Value|
+------+----+-----+-----+
|1     |Red |track|0    |
|1     |Red |Type |solid|
|1     |Red |Unit |0    |
|2     |Blue|Type |light|
|2     |Blue|Unit |0    |
|2     |Blue|track|0    |
|3     |Blue|Unit |0    |
|3     |Blue|Type |solid|
|3     |Blue|track|0    |
|4     |Blue|Type |solid|
|4     |Blue|track|0    |
|4     |Blue|Unit |0    |
+------+----+-----+-----+

Upvotes: 2

Related Questions