anuj
anuj

Reputation: 142

Azure databricks using spark-xml connector customization on root tag

I am working on Spark-xml connector inside azure databricks notebook, I have to modify the root tag using variable. Here is my code snippet.

 root_tag = """PurchaseService xmlns:NS1="http://purchaseservice.parts.com/1_2" 
         <GMFVerb>
         <Get>
                 <Action>RetrieveHSClassification</Action>
             </Get>
         </GMFVerb>
 """
# Select necessary columns
df_nestedXML = df.select(
    struct(
        lit("1.0").alias("MsgVersion"),
        lit("IT").alias("SenderID"),
        current_timestamp().alias("SendTime")
    ).alias("GmfHeader"),
    col("VNN").alias("PartNumber"),
    lit("IT").alias("CustomsRequestorID"),
    col("Country").alias("CountryCode"),
    lit("4").alias("Priority")
)

# Coalesce and write XML
df_nestedXML \
    .coalesce(1) \
    .write \
    .format("com.databricks.spark.xml") \
    .option("rootTag", root_tag) \
    .option("rowTag", "PartHSCode") \
    .mode("overwrite") \
    .save(output_path)

Execpted output I am looking. Note GMFVerb section will come only once in the XML. PartHSCode can repeat multiple time.

<PurchaseService xmlns:NS1="http://purchaseservice.parts.com/1_2">
    <GMFVerb>
        <Get>
            <Action>RetrieveHSClassification</Action>
        </Get>
    </GMFVerb>
    <PartHSCode>
        <GmfHeader>
            <n1:MsgVersion>1.0</n1:MsgVersion>
            <n1:SenderID>xxx</n1:SenderID>
            <n1:SendTime>2024-01-23T13:15:30.45+01:00</n1:SendTime>
        </GmfHeader>
        <PartNumber>1Mxxxxxxxx/PartNumber>
        <CustomsRequestorID>xxxxx</CustomsRequestorID>
        <CountryCode>US</CountryCode>
        <Priority>4</Priority>
    </PartHSCode>
    <PartHSCode>
        <GmfHeader>
            <n1:MsgVersion>1.0</n1:MsgVersion>
            <n1:SenderID>xxx</n1:SenderID>
            <n1:SendTime>2024-01-23T13:15:30.45+01:00</n1:SendTime>
        </GmfHeader>
        <PartNumber>11xxxxxxxx/PartNumber>
        <CustomsRequestorID>yyyy</CustomsRequestorID>
        <CountryCode>IN</CountryCode>
        <Priority>4</Priority>
    </PartHSCode>
</PurchaseService>

Upvotes: 0

Views: 38

Answers (0)

Related Questions