Matt
Matt

Reputation: 982

Trouble parsing XML with python

I have parsed an XML file with BeautifulSoup in Python and I am having trouble extracting the data out of it. An example of the structure of the XML is below:

<Products page="0" pages="-1" records="27">
  <Product id="ABC001">
    <Name>This product name</Name>
    <Cur>USD</Cur>
    <Tag>Text</Tag>
    <Classes>
      <Class id="USD">
        <ClassCur>USD</ClassCur>
        <Identifier>XYZ123456</Identifier>
      </Class>
    </Classes>
  </Product>
  <Product id="XYZ002">
    <Name>That product name</Name>
    <Cur>EUR</Cur>
    <Tag>More Text</Tag>
    <Classes>
      <Class id="EUR">
        <ClassCur>EUR</ClassCur>
        <Identifier>VDSHG123456</Identifier>
      </Class>
    </Classes>
  </Product>
</Products>

The first thing I have been trying to accomplish but have so far failed to do is to extract all of the Product and Class id's "ABC001", "XYZ002" etc..

What I have tried is

products = soup.find_all("Product")

for p in products:
    print(p.find("name")) # gets the name tag
    print(p.find("cur")) # gets the cur tag
    # ...etc

However, I can't figure out how to access id within Product. For example, p.find("product") returns None.

Note that while I am using bs4 I don't have to - it's just that I have done a lot of web scraping with Python + bs4 and have found bs4 to be useful in navigating through HTML, so assumed it would be the ideal way of handling XML.

Upvotes: 0

Views: 49

Answers (1)

jwodder
jwodder

Reputation: 57610

id is an attribute of Product, not a child element, so you access it with:

p['id']

Upvotes: 1

Related Questions