Reputation: 199
Spark SQL (Databricks) function xpath ignores empty tags in XML. For example for below XML XPATH returns array ["ABC"]. We need it as ["ABC", NULL or empty string] because when we use this along with posexplode_outer, omitting null in XPATH will cause incorrect association. Is there any option in XPATH to retain nulls? I did not find any detailed documentation for this.
SELECT xpath("<PARTY>
<PARTY_EVENT>
<EVENTTYPE>VISITED</EVENTTYPE>
<LOCATION>ABC</LOCATION>
</PARTY_EVENT>
<PARTY_EVENT>
<EVENTTYPE>VISITED</EVENTTYPE>
<LOCATION />
</PARTY_EVENT>
</PARTY>"
, '/PARTY/PARTY_EVENT/LOCATION/text()')
Upvotes: 0
Views: 655
Reputation: 973
Add the nodes not containing any text by using [not(text())]
selector:
(/PARTY/PARTY_EVENT/LOCATION|/PARTY/PARTY_EVENT/LOCATION[not(text())])/text()
Upvotes: 1