Reputation: 22440
I've written a script to get the food names out of the elements pasted below but I can't find any way to get the names. I'm not willing to scrape the values. Is there any way I can get only the food name from the below elements?
This is what I've tried so far:
from bs4 import BeautifulSoup
content="""
<foods>
<apple>green</apple>
<strawberry>red</strawberry>
<banana>yellow</banana>
</foods>
"""
soup = BeautifulSoup(content,"lxml")
data = [item for item in soup.select("foods")]
print(data)
If I run my script as it is it, It produces the exact same elements available within content
.
Output I'm expecting:
apple,strawberry,banana
Upvotes: 0
Views: 60
Reputation: 168
Because your content is in XML form , you can extract data from your content using xml.etree.ElementTree
module like this:
import xml.etree.ElementTree as ET
content="""
<foods>
<apple>green</apple>
<strawberry>red</strawberry>
<banana>yellow</banana>
</foods>
"""
foods = ET.fromstring(content)
for food in foods:
print(food.tag)
# Output: apple,strawberry,banana
Upvotes: 1
Reputation: 7238
Try this:
>>> data = [x.name for x in soup.find('foods').findChildren()]
>>> data
['apple', 'strawberry', 'banana']
I guess it is self explanatory.
Upvotes: 2