RNs_Ghost
RNs_Ghost

Reputation: 1777

Extracting tag information with beautifulsoup and python

Say I have some xml like

<item name=bread weight="5" edible="yes">
<body> some blah </body>
<item>

<item name=eggs weight="5" edible="yes">
<body> some blah </body>
<item>

<item name=meat weight="5" edible="yes">
<body> some blah </body>
<item>

I want to store the name of each item in a list using beautiful soup

Here's the attempt so far:

names =list()

for c in soup.findAll("item"):
    #get name from the tag
        names.append(name i got from tag)

This method has worked perfectly for extracting text between tags.

I've tried copying the methods used for extracting links <a href="www.blah.com"> but it doesn't seem to work.

How would I store the name information in a list? (other lists contain the body text so for associativity reasons the indexes have to be consistent).

Thanks very much

Upvotes: 2

Views: 469

Answers (1)

bossylobster
bossylobster

Reputation: 10163

Use dict(item.attrs).get('name') to get the name.

You are having issues since <item> is supposed to be a closing tag but it is an opening tag, hence you get 6 matches rather than 3. If you have any control over the text, please use closing tags to avoid this.

Here is the full snippet working as intended:

names = list()

for item in soup.findAll('item'):
    name = dict(item.attrs).get('name')
    if name is not None:
        names.append(name)

Upvotes: 2

Related Questions