nee21
nee21

Reputation: 199

Spark SQL (Databricks) function xpath ignores empty tags in XML

Spark SQL (Databricks) function xpath ignores empty tags in XML. For example for below XML XPATH returns array ["ABC"]. We need it as ["ABC", NULL or empty string] because when we use this along with posexplode_outer, omitting null in XPATH will cause incorrect association. Is there any option in XPATH to retain nulls? I did not find any detailed documentation for this.


SELECT xpath("<PARTY>
                <PARTY_EVENT>
                    <EVENTTYPE>VISITED</EVENTTYPE>
                    <LOCATION>ABC</LOCATION>
                </PARTY_EVENT>
                <PARTY_EVENT>
                    <EVENTTYPE>VISITED</EVENTTYPE>
                    <LOCATION />
                </PARTY_EVENT>
                </PARTY>"
        , '/PARTY/PARTY_EVENT/LOCATION/text()')

enter image description here

Upvotes: 0

Views: 655

Answers (1)

Arseny
Arseny

Reputation: 973

Add the nodes not containing any text by using [not(text())] selector:

(/PARTY/PARTY_EVENT/LOCATION|/PARTY/PARTY_EVENT/LOCATION[not(text())])/text()

Upvotes: 1

Related Questions