SIM
SIM

Reputation: 22440

Unable to fetch names out of some elements

I've written a script to get the food names out of the elements pasted below but I can't find any way to get the names. I'm not willing to scrape the values. Is there any way I can get only the food name from the below elements?

This is what I've tried so far:

from bs4 import BeautifulSoup

content="""
<foods>
  <apple>green</apple>
  <strawberry>red</strawberry>
  <banana>yellow</banana>
</foods>
"""
soup = BeautifulSoup(content,"lxml")
data = [item for item in soup.select("foods")]
print(data)

If I run my script as it is it, It produces the exact same elements available within content.

Output I'm expecting:

apple,strawberry,banana

Upvotes: 0

Views: 60

Answers (2)

Alireza
Alireza

Reputation: 168

Because your content is in XML form , you can extract data from your content using xml.etree.ElementTree module like this:

import xml.etree.ElementTree as ET
content="""
<foods>
     <apple>green</apple>
     <strawberry>red</strawberry>
     <banana>yellow</banana>
</foods>
"""
foods = ET.fromstring(content)
for food in foods:
    print(food.tag)
# Output: apple,strawberry,banana 

Upvotes: 1

Keyur Potdar
Keyur Potdar

Reputation: 7238

Try this:

>>> data = [x.name for x in soup.find('foods').findChildren()]
>>> data
['apple', 'strawberry', 'banana']

I guess it is self explanatory.

Upvotes: 2

Related Questions