Faraz Siddiqi
Faraz Siddiqi

Reputation: 37

How to convert Beautiful Soup 4 Tag objects into a String?

I'm pretty new to web-scraping, and I am parsing through an XML file with Beautiful Soup 4. I want to pull the string form of the urls that are encompassed by the link tag: < link> https://www.whateveriwant.com < link>.

Here is what I have done so far:

bs_content = bs(content, "lxml") # content is an entire XML file in one string
bsUrlsList = bs_content.find_all("link")  # gets all urls, fills a list with bs4 tag objs

The problem is when I iterate through bsUrlsList and print each item, it prints a list of [,], as opposed to the actual string version of the link.

for link in bsUrlsList:
  print(type(link)) # prints <class 'bs4.element.Tag'>

Big Question: How do I convert <class 'bs4.element.Tag'> to a String so that I can eventually have a list of string urls that I pulled from the XML file?

Upvotes: 1

Views: 893

Answers (1)

goal = list(soup.select("link").strings)

Or

goal = [x.text for x in soup.findAll("link")]

Upvotes: 0

Related Questions