lawson
lawson

Reputation: 377

Combine find_all beautiful soup tags into one string

I'm performing a scrape using beautifulsoup and html parser and have selected the part of the html I want to work with and saved this as 'containers'.

from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup
import ssl

my_url = 'https://www._________.co.uk/'
context = ssl._create_unverified_context()
uClient = uReq(my_url, context=context)
page_html = uClient.read()
uClient.close()

page_soup = soup(page_html, "html.parser")
containers = page_soup.findAll("div",{"class":"row"})

I've got a challenge when it comes to a couple of tags which are all next to one another in a span.

I can bring up the result by using

company_string = container.span.find_all("b")

Which returns the following:

[<b>Company</b>, <b>Name</b>, <b>Limited</b>]

How can I ditch the tags and combine these into a string so that it outputs as 'Company Name Limited'?

Original html is here:

<span class="company">
<a href="/cmp/Company-Name-Limited" onmousedown="this.href = 
appendParamsOnce(this.href, 'xxxx')" rel="noopener" target="_blank">
<b>Company</b> <b>Name</b> <b>Limited</b>
</a>
</span>

Upvotes: 3

Views: 2716

Answers (2)

nandal
nandal

Reputation: 2634

try the following:-

outputString = ' '.join([item.get_text() for item in company_string])

It will return a string containing values of all elements concatenated with space.

Upvotes: 1

akash karothiya
akash karothiya

Reputation: 5950

Use .text

>>> output = ' '.join([item.text for item in company_string])
'Company Name Limited'

Upvotes: 9

Related Questions