Reputation: 165
I have a sample xml file and need to parse it and convert it into key value in spark dataset[Version 2.2] in java [1.8]
sample.xml -
<?xml version="1.0" encoding="UTF-8"?>
-<RECORD>
-<PROP NAME="xxx">
<PVAL>123</PVAL>
</PROP>
-<PROP NAME="yyy">
<PVAL>456</PVAL>
</PROP>
-<PROP NAME="zzz">
<PVAL>786</PVAL>
</PROP>
-<RECORD>
Tried with below code -
Dataset<Row> xmlDS = spark.read()
.format("com.databricks.spark.xml")
.option("rowTag", "RECORD")
.load("sample.xml");
XMLDS.printSchema();
root
|-- PROP: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- PVAL: string (nullable = true)
| | |-- _NAME: string (nullable = true)
output I got from above one -
+---------------------------------
|PROP
|
+---------------------------------
|[[123,xxx], [456,yyy], [786,zzz]]
I want Expected output in key value pair format in Dataset
NAME PVAL
-----------------------------
xxx 123
yyy 456
zzz 786
Can some one help on this ?.Thx
Upvotes: 0
Views: 204
Reputation: 41957
All you need is change the rowTag
and add a rootTag
as
Dataset<Row> xmlDS = spark.read()
.format("com.databricks.spark.xml")
.option("rootTag", "RECORD")
.option("rowTag", "PROP")
.load("sample.xml");
xmlDS.printSchema();
xmlDS.show(false);
which should give you
root
|-- PVAL: long (nullable = true)
|-- _NAME: string (nullable = true)
+----+-----+
|PVAL|_NAME|
+----+-----+
|123 |xxx |
|456 |yyy |
|786 |zzz |
+----+-----+
I hope the answer is helpful
Upvotes: 1