BeautifulSoup: Merge all p elements into one string?

Question

I currently use following Python code excerpt to get all

elements of a webpage:

def scraping(url, html):
    data = {}
    soup = BeautifulSoup(html,"lxml")

    data["news"] = []

    page = soup.find("div", {"class":"container_news"}).findAll('p')
    page_text = ''

    for p in page:
        page_text += ''.join(p.findAll(text = True))
        data["news"].append(page_text)
    print(page_text)

    return data

However, the output of page_text looks like:

"['New news on the internet. ', 'Here is some text. ', ""Here is some other."", ""And then there are other variations 

Looks like there are some non-text elements. 
\xa0""]" ...

Is it possible to get the content cleaner and merge the lists into one string? BeautifulSoup solutions would be preferred over regex variants.

Thank you!

BeautifulSoup: Merge all p elements into one string?

Answers (1)

Related Questions