Reputation: 142
I am working on Spark-xml connector inside azure databricks notebook, I have to modify the root tag using variable. Here is my code snippet.
root_tag = """PurchaseService xmlns:NS1="http://purchaseservice.parts.com/1_2"
<GMFVerb>
<Get>
<Action>RetrieveHSClassification</Action>
</Get>
</GMFVerb>
"""
# Select necessary columns
df_nestedXML = df.select(
struct(
lit("1.0").alias("MsgVersion"),
lit("IT").alias("SenderID"),
current_timestamp().alias("SendTime")
).alias("GmfHeader"),
col("VNN").alias("PartNumber"),
lit("IT").alias("CustomsRequestorID"),
col("Country").alias("CountryCode"),
lit("4").alias("Priority")
)
# Coalesce and write XML
df_nestedXML \
.coalesce(1) \
.write \
.format("com.databricks.spark.xml") \
.option("rootTag", root_tag) \
.option("rowTag", "PartHSCode") \
.mode("overwrite") \
.save(output_path)
Execpted output I am looking. Note GMFVerb section will come only once in the XML. PartHSCode can repeat multiple time.
<PurchaseService xmlns:NS1="http://purchaseservice.parts.com/1_2">
<GMFVerb>
<Get>
<Action>RetrieveHSClassification</Action>
</Get>
</GMFVerb>
<PartHSCode>
<GmfHeader>
<n1:MsgVersion>1.0</n1:MsgVersion>
<n1:SenderID>xxx</n1:SenderID>
<n1:SendTime>2024-01-23T13:15:30.45+01:00</n1:SendTime>
</GmfHeader>
<PartNumber>1Mxxxxxxxx/PartNumber>
<CustomsRequestorID>xxxxx</CustomsRequestorID>
<CountryCode>US</CountryCode>
<Priority>4</Priority>
</PartHSCode>
<PartHSCode>
<GmfHeader>
<n1:MsgVersion>1.0</n1:MsgVersion>
<n1:SenderID>xxx</n1:SenderID>
<n1:SendTime>2024-01-23T13:15:30.45+01:00</n1:SendTime>
</GmfHeader>
<PartNumber>11xxxxxxxx/PartNumber>
<CustomsRequestorID>yyyy</CustomsRequestorID>
<CountryCode>IN</CountryCode>
<Priority>4</Priority>
</PartHSCode>
</PurchaseService>
Upvotes: 0
Views: 38