Reputation: 1777
Say I have some xml like
<item name=bread weight="5" edible="yes">
<body> some blah </body>
<item>
<item name=eggs weight="5" edible="yes">
<body> some blah </body>
<item>
<item name=meat weight="5" edible="yes">
<body> some blah </body>
<item>
I want to store the name of each item in a list using beautiful soup
Here's the attempt so far:
names =list()
for c in soup.findAll("item"):
#get name from the tag
names.append(name i got from tag)
This method has worked perfectly for extracting text between tags.
I've tried copying the methods used for extracting links <a href="www.blah.com">
but it doesn't seem to work.
How would I store the name information in a list? (other lists contain the body text so for associativity reasons the indexes have to be consistent).
Thanks very much
Upvotes: 2
Views: 469
Reputation: 10163
Use dict(item.attrs).get('name')
to get the name.
You are having issues since <item>
is supposed to be a closing tag but it is an opening tag, hence you get 6 matches rather than 3. If you have any control over the text, please use closing tags to avoid this.
Here is the full snippet working as intended:
names = list()
for item in soup.findAll('item'):
name = dict(item.attrs).get('name')
if name is not None:
names.append(name)
Upvotes: 2