Spark xpath function to return null if no value present for an attribute

Question

I am using spark xpath to get the attribute values from an xml string. The xpath returns an array of values from the xml tag. If there are multiple rows present in a tag with one of the rows having an attribute with null, the xpath function is ignoring that value in the returned array. What I am looking for is, if the value is not present a default string to be returned so that it will not alter the order of values in the array.

df = spark.createDataFrame([['20712081Dr. James Hanover']], ['visitors'])

df = df.selectExpr('xpath(visitors,"./entities/entity[@type=\'PHYSICIANS\']/entity/attribute[@name=\'ID\']/text()") ID','xpath(visitors,"./entities/entity[@type=\'PHYSICIANS\']/entity/attribute[@name=\'NAME\']/text()") NAME',)

display(df)

This gives me an output as below

What I am expecting as an output is as below

Can someone please help.?

Spark xpath function to return null if no value present for an attribute

Answers (1)

Related Questions