Reputation: 37
I'm pretty new to web-scraping, and I am parsing through an XML file with Beautiful Soup 4. I want to pull the string form of the urls that are encompassed by the link tag: < link> https://www.whateveriwant.com < link>.
Here is what I have done so far:
bs_content = bs(content, "lxml") # content is an entire XML file in one string
bsUrlsList = bs_content.find_all("link") # gets all urls, fills a list with bs4 tag objs
The problem is when I iterate through bsUrlsList and print each item, it prints a list of [,], as opposed to the actual string version of the link.
for link in bsUrlsList:
print(type(link)) # prints <class 'bs4.element.Tag'>
Big Question: How do I convert <class 'bs4.element.Tag'> to a String so that I can eventually have a list of string urls that I pulled from the XML file?
Upvotes: 1
Views: 893
Reputation: 11515
goal = list(soup.select("link").strings)
Or
goal = [x.text for x in soup.findAll("link")]
Upvotes: 0