Reputation: 50497
Here's an example:
<p class='animal'>cats</p>
<p class='attribute'>they meow</p>
<p class='attribute'>they have fur</p>
<p class='animal'>turtles</p>
<p class='attribute'>they don't make noises</p>
<p class='attribute'>they have shells</p>
If each animal was in a separate element I could just iterate over the elements. That would be great. But the website I'm trying to parse has all the information in one element.
What would be the best way of either separating the soup into different animals, or to some other way extract the attributes and which animal they belong to?
(feel free to recommend a better title)
Upvotes: 1
Views: 359
Reputation: 304137
If you don't need to keep the animal names in order you can simplify Jamie's answer like this
from BeautifulSoup import BeautifulSoup
soup = BeautifulSoup("""
<p class='animal'>cats</p>
<p class='attribute'>they meow</p>
<p class='attribute'>they have fur</p>
<p class='animal'>turtles</p>
<p class='attribute'>they don't make noises</p>
<p class='attribute'>they have shells</p>
""")
attributes = {}
for p in soup.findAll('p'):
if (p['class'] == 'animal'):
animal = p.string
attributes[animal] = []
elif (p['class'] == 'attribute'):
attributes[animal].append(p.string)
print attributes.keys()
print attributes
Upvotes: 2
Reputation: 18350
from BeautifulSoup import BeautifulSoup
soup = BeautifulSoup("""
<p class='animal'>cats</p>
<p class='attribute'>they meow</p>
<p class='attribute'>they have fur</p>
<p class='animal'>turtles</p>
<p class='attribute'>they don't make noises</p>
<p class='attribute'>they have shells</p>
""")
animals = []
attributes = {}
for p in soup.findAll('p'):
if (p['class'] == 'animal'):
animals.append(p.string)
elif (p['class'] == 'attribute'):
if animals[-1] not in attributes.keys():
attributes[animals[-1]] = [p.string]
else:
attributes[animals[-1]].append(p.string)
print animals
print attributes
That should work.
Upvotes: 2