Reputation: 982
I have parsed an XML file with BeautifulSoup in Python and I am having trouble extracting the data out of it. An example of the structure of the XML is below:
<Products page="0" pages="-1" records="27">
<Product id="ABC001">
<Name>This product name</Name>
<Cur>USD</Cur>
<Tag>Text</Tag>
<Classes>
<Class id="USD">
<ClassCur>USD</ClassCur>
<Identifier>XYZ123456</Identifier>
</Class>
</Classes>
</Product>
<Product id="XYZ002">
<Name>That product name</Name>
<Cur>EUR</Cur>
<Tag>More Text</Tag>
<Classes>
<Class id="EUR">
<ClassCur>EUR</ClassCur>
<Identifier>VDSHG123456</Identifier>
</Class>
</Classes>
</Product>
</Products>
The first thing I have been trying to accomplish but have so far failed to do is to extract all of the Product and Class id's "ABC001"
, "XYZ002"
etc..
What I have tried is
products = soup.find_all("Product")
for p in products:
print(p.find("name")) # gets the name tag
print(p.find("cur")) # gets the cur tag
# ...etc
However, I can't figure out how to access id
within Product
. For example, p.find("product")
returns None
.
Note that while I am using bs4 I don't have to - it's just that I have done a lot of web scraping with Python + bs4 and have found bs4 to be useful in navigating through HTML, so assumed it would be the ideal way of handling XML.
Upvotes: 0
Views: 49
Reputation: 57610
id
is an attribute of Product
, not a child element, so you access it with:
p['id']
Upvotes: 1