Separating HTML into groups using BeautifulSoup when groups are all in the same element

Question

Here's an example:

cats
they meow
they have fur
turtles
they don't make noises
they have shells

If each animal was in a separate element I could just iterate over the elements. That would be great. But the website I'm trying to parse has all the information in one element.

What would be the best way of either separating the soup into different animals, or to some other way extract the attributes and which animal they belong to?

(feel free to recommend a better title)

Jamie Wong · Accepted Answer

from BeautifulSoup import BeautifulSoup

soup = BeautifulSoup("""
cats
they meow
they have fur
turtles
they don't make noises
they have shells
""")

animals = []
attributes = {}

for p in soup.findAll('p'):
    if (p['class'] == 'animal'):
        animals.append(p.string)
    elif (p['class'] == 'attribute'):
        if animals[-1] not in attributes.keys():
            attributes[animals[-1]] = [p.string]
        else:
            attributes[animals[-1]].append(p.string)

print animals
print attributes

That should work.

Separating HTML into groups using BeautifulSoup when groups are all in the same element

Answers (2)

Related Questions