user2647763 - RIMD
user2647763 - RIMD

Reputation: 94

How can I browse & list XPATH of a XML Message?

How can I browse & list XPATH of a XML Message?

****SEE EDIT portion Below:

Thanks for looking into this issue. I am not sure, whether this is the right forum to post this thread. If not, let me know the right forum to post this thread.

We have a complex XML Message (data in XML format). We are exploring a way to extract all the XPATHs of this XML message and its element/attribute level data content. We tried with XMLSPY, & xmltwig, but no luck. Xml_grep pulls data, if we give XPATH input. There is no option in xml_grep to browse all XPATHS of a XML message.

I have well-formed XML message. I want to produce a list/report as

  1. All Xpath of XML message (Browse all XPATH and list of XML message)

  2. Xpath , data content for this XPATH (Browse all XPATH, data content and list both of XML message)

Here is an example (Input XML Message)

<?xml version="1.0"?>
<PARTS>
<TITLE>Computer Parts</TITLE>
<PART>
<ITEM>Motherboard</ITEM>
<MANUFACTURER>ASUS</MANUFACTURER>
<MODEL>P3B-F</MODEL>
<COST> 123.00</COST>
</PART>
<PART>
<ITEM>Video Card</ITEM>
<MANUFACTURER>ATI</MANUFACTURER>
<MODEL>All-in-Wonder Pro</MODEL>
<COST> 160.00</COST>
</PART>
<PART>
<ITEM>Sound Card</ITEM>
<MANUFACTURER>Creative Labs</MANUFACTURER>
<MODEL>Sound Blaster Live</MODEL>
<COST> 80.00</COST>
</PART>
<PART>
<ITEM>inch Monitor</ITEM>
<MANUFACTURER>LG Electronics</MANUFACTURER>
<MODEL> 995E</MODEL>
<COST> 290.00</COST>
</PART>
</PARTS>

The desired output --> I created the following XML list manually

/PARTS/TITLE Computer       Parts
/PARTS/PART[1]/ITEM         Motherboard
/PARTS/PART[1]/MANUFACTURER ASUS
/PARTS/PART[1]/MODEL        P3B-F
/PARTS/PART[1]/COST         123.00
/PARTS/PART[2]/ITEM         Video Card
/PARTS/PART[2]/MANUFACTURER ATI
............
..............
..................
...................

Are there any open source product to produce such report for XML Message?

What are the ways to extract XPATHs/XPATH, data content?

Thanks for allowing to pick the brain of this forum.

+++++

Thanks. The above code output

Field|Value
/*|

/*/*[1]|X
/*/*[2]|000000000
/*/*[3]|000000000
/*/*[4]|&
/*/*[5]|

I am not able to get text xpath

Here is the input xml

<CorrectedW2Ind>X</CorrectedW2Ind>
<EmployeeSSN>000000000</EmployeeSSN>
<EmployerEIN>000000000</EmployerEIN>
<EmployerNameControlTxt>&amp;</EmployerNameControlTxt>
<EmployerName>
    <BusinessNameLine1Txt>#</BusinessNameLine1Txt>
    <BusinessNameLine2Txt>#</BusinessNameLine2Txt>
</EmployerName>
<EmployerUSAddress>
    <AddressLine1Txt>0</AddressLine1Txt>
    <AddressLine2Txt>0</AddressLine2Txt>
    <CityNm>A</CityNm>
    <StateAbbreviationCd>PW</StateAbbreviationCd>
    <ZIPCd>00000</ZIPCd>
</EmployerUSAddress>

    <EmployersUseGrp>
    <EmployersUseCd>A</EmployersUseCd>
    <PriorUSERRAContributionYr>00</PriorUSERRAContributionYr>
    <EmployersUseAmt>0</EmployersUseAmt>
</EmployersUseGrp>
<EmployersUseGrp>
    <EmployersUseCd>A</EmployersUseCd>
    <PriorUSERRAContributionYr>00</PriorUSERRAContributionYr>
    <EmployersUseAmt>0</EmployersUseAmt>
</EmployersUseGrp>
<EmployersUseGrp>
    <EmployersUseCd>A</EmployersUseCd>
    <PriorUSERRAContributionYr>00</PriorUSERRAContributionYr>
    <EmployersUseAmt>0</EmployersUseAmt>
</EmployersUseGrp>
<EmployersUseGrp>
    <EmployersUseCd>A</EmployersUseCd>
    <PriorUSERRAContributionYr>00</PriorUSERRAContributionYr>
    <EmployersUseAmt>0</EmployersUseAmt>
</EmployersUseGrp>
<EmployersUseGrp>
    <EmployersUseCd>A</EmployersUseCd>
    <PriorUSERRAContributionYr>00</PriorUSERRAContributionYr>
    <EmployersUseAmt>0</EmployersUseAmt>
</EmployersUseGrp>

a) What is the lxml method to use , to get value, Xpath (text) using above code?

b) What is the lxml method to use, to get repeating group node aggration?

like Xpath of EmployersUseGrp ====> 5

EDIT ===== 6/26/2019 ========================

I am not able to open new questions. I am getting question limit exceeded message. I am posting the follow up to this code here.

I am trying to use the posted pyhton code answer. I am getting weird output.

I have a large XML file like (inputf.xml). I used this file as input = inputf.xml in posted code




    <?xml version="1.0" encoding="UTF-8"?>
      <DataFileFor>
        <DataR>
           <Id>5070022019330a0050hq</Id>
             <NUM>30221730001019</NUM>
             <Postmark>2020-01-03T09:25:57.000-05:00</Postmark>
             <TNO>47647</TNO>
.
.
.
.
.
</DataFileFor>

++++

When grab the XPATH of Node using xml_grep, I am getting.

xml_grep DataFileFor/DataR/Ret/W2 inputf.xml ===> output


<?xml version="1.0" ?>

<xml_grep version="0.7" date="Fri Jun 26 13:07:11 2020">

<file filename="inputf.xml">

  <W2 Id="W2" dName="W2" sId="00000000" sVersionNum="String">

    <CorrectedW2Ind>X</CorrectedW2Ind>

    <EmployeeSSN>000000000</EmployeeSSN>

    <EmployerEIN>000000000</EmployerEIN>

    <EmployerNameControlTxt>S</EmployerNameControlTxt>

    <EmployerName>

      <BusinessNameLine1Txt>String</BusinessNameLine1Txt>

      <BusinessNameLine2Txt>String</BusinessNameLine2Txt>

    </EmployerName>

    <EmployerUSAddress>

      <AddressLine1Txt>String</AddressLine1Txt>

      <AddressLine2Txt>String</AddressLine2Txt>

      <CityNm>String</CityNm>

      <StateAbbreviationCd>AL</StateAbbreviationCd>

      <ZIPCd>000000000</ZIPCd>
.
.
.
.
.
</W2>

When I use this code, it is not producing readable Xpaths. The output XPATHS are like


/DataFileFor/DataR/*[8]/*[2]/*[6]/*[3]/*[10]|X
/DataFileFor/DataR/*[8]/*[2]/*[6]/*[3]/*[11]|00000000
/DataFileFor/DataR/*[8]/*[2]/*[6]/*[3]/*[12]|00000000
/DataFileFor/DataR/*[8]/*[2]/*[6]/*[3]/*[13]|S
/DataFileFor/DataR/*[8]/*[2]/*[6]/*[3]/*[14]|String

The attributes

Id="W2" dName="W2" sId="00000000" sVersionNum="String"> are not showing up in the output

What are the changes required to the code, to fix this?

Thanks for your guidance.

Upvotes: 0

Views: 615

Answers (1)

IbnStack
IbnStack

Reputation: 61

Just seen this, i wrote something that did this in python - outputs to csv, pipe delimited. Feel free to use it. Happy to answer any questions but don't expect immediate response.

from lxml import etree, objectify

def parseXML(xmlFile, outputFile):
    """
    Parse the XML function
    """
    with open(xmlFile) as fobj:
        xml = fobj.read()

    f = open(outputFile,'w') #open write to file
    root = etree.fromstring(xml)

    f.write("%s|%s\n" %("Field", "Value"))
    tree = etree.ElementTree(root)
    for e in root.iter():
        f.write("%s|%s\n" %(tree.getpath(e), e.text))

    f.close()

if __name__ == "__main__":
    print ('Loading variables...')
    input = '16a.xml'
    output = input + '.csv'

    parseXML(input,output)

Upvotes: 1

Related Questions