Mithu Tokder
Mithu Tokder

Reputation: 11

Parsing XML processing instructions in PySpark

I am trying to parse one XML file that has processing instructions using databricks spark-xml. Example XML

<books>
    <?SOURCE sample_file?>
    <?DATE 12/01/2022?>
    <book>
        <title>Spark Tutorial</title>
        <desc>Spark Tutorial for beginners</desc>
        <author>John C</author>
        <details>
            <price>1234</price>
            <pagecount>1000</pagecount>
            <chapters>
                <chapter>C1</chapter>
                <chapter>C2</chapter>
                <chapter>C3</chapter>
            </chapters>
        </details>
    </book>
    <book>
        <title>Scala</title>
        <desc>Scala Tutorial for beginners</desc>
        <author>John C</author>
        <details>
            <price>599</price>
            <pagecount>1000</pagecount>
            <chapters>
                <chapter>C10</chapter>
                <chapter>C20</chapter>
                <chapter>C30</chapter>
            </chapters>
        </details>
    </book>
</books>

Is there any way to parse those XML processing instructions SOURCE & DATE? I can read other XML tag values but not able to read the processing instructions.

I tried with lxml library & able to read the processing instructions but not able to do the same using spark-xml library.

Thanks in advance

Upvotes: 1

Views: 89

Answers (0)

Related Questions