devanathan
devanathan

Reputation: 818

Generate XML with attribute and value in Spark(scala) using databricks

I wanted to create a nested XML from CSV/DataFrame in scala spark. I am using Databricks spark-XML library for converting the DataFrame to XML format.

I was trying to create an output like below, but unable to achieve it

<rows> 
<row>
<name id=10>Mahashree</name>
</row>
</rows>

I have tried with struct

{"_VALUE":"Mahashree","_id":10}

but resulted as below

<rows> 
<row>
<name id=10 VALUE="Mahashree"></name>
</row>
</rows>

In DataBricks Documentation they have documentation for converting the nested XML but not to nested XML.

<one>
    <two myTwoAttrib="BBBBB">two</two>
    <three>three</three>
</one>

produces a schema below:

root
 |-- two: struct (nullable = true)
 |    |-- _VALUE: string (nullable = true)
 |    |-- _myTwoAttrib: string (nullable = true)
 |-- three: string (nullable = true)

can anyone help to the nested element with attributes?

Thanks in Advance

Upvotes: 0

Views: 1911

Answers (1)

pasha701
pasha701

Reputation: 7207

Can be achieved with two options "attributePrefix" and "valueTag" described here: https://github.com/databricks/spark-xml

For example, all must be fine if add to stuct additional underscore to "id":

{"_VALUE":"Mahashree","__id":10}

And save with such options:

.option("attributePrefix", "__")
.option("valueTag", "_VALUE")

Upvotes: 4

Related Questions