Trouble parsing XML with python

Question

I have parsed an XML file with BeautifulSoup in Python and I am having trouble extracting the data out of it. An example of the structure of the XML is below:


  
    This product name
    USD
    Text
    
      
        USD
        XYZ123456
      
    
  
  
    That product name
    EUR
    More Text
    
      
        EUR
        VDSHG123456

The first thing I have been trying to accomplish but have so far failed to do is to extract all of the Product and Class id's "ABC001", "XYZ002" etc..

What I have tried is

products = soup.find_all("Product")

for p in products:
    print(p.find("name")) # gets the name tag
    print(p.find("cur")) # gets the cur tag
    # ...etc

However, I can't figure out how to access id within Product. For example, p.find("product") returns None.

Note that while I am using bs4 I don't have to - it's just that I have done a lot of web scraping with Python + bs4 and have found bs4 to be useful in navigating through HTML, so assumed it would be the ideal way of handling XML.

jwodder · Accepted Answer

id is an attribute of Product, not a child element, so you access it with:

p['id']

Trouble parsing XML with python

Answers (1)

Related Questions