Reputation: 11
I am trying to parse one XML file that has processing instructions using databricks
spark-xml
.
Example XML
<books>
<?SOURCE sample_file?>
<?DATE 12/01/2022?>
<book>
<title>Spark Tutorial</title>
<desc>Spark Tutorial for beginners</desc>
<author>John C</author>
<details>
<price>1234</price>
<pagecount>1000</pagecount>
<chapters>
<chapter>C1</chapter>
<chapter>C2</chapter>
<chapter>C3</chapter>
</chapters>
</details>
</book>
<book>
<title>Scala</title>
<desc>Scala Tutorial for beginners</desc>
<author>John C</author>
<details>
<price>599</price>
<pagecount>1000</pagecount>
<chapters>
<chapter>C10</chapter>
<chapter>C20</chapter>
<chapter>C30</chapter>
</chapters>
</details>
</book>
</books>
Is there any way to parse those XML processing instructions SOURCE
& DATE
?
I can read other XML tag values but not able to read the processing instructions.
I tried with lxml
library & able to read the processing instructions but not able to do the same using spark-xml
library.
Thanks in advance
Upvotes: 1
Views: 89