BeautifulSoup : do not insert a line break with soup.get_text for certain tags like

Question

I am trying to parse the text from the HTML using beautiful soup. I have selected the node, but when i try to get the text:

HTML element


   Join the Adventure
   If you're passionate about improving the health of others and working on a big problem.
   Thank you

Python code, the element tag holds the above mentioned HTML div

element.get_text(" | ")

Current output is

Join the Adventure | If you're  | passionate |  about  | improving the health |  of others and working on a  | big problem | . | Thank you

So the get_text(' | ') breaks the text by the tags and hence it breaks the text on tags as well. My requirement is to not break on the inline tags and get the text as:

Expected output

Join the Adventure | If you're passionate about  improving the health of others and working on a big problem . | Thank you

I am looking for a generic solution as my div is not fixed.

Andrej Kesely · Accepted Answer

You can .unwrap() the tags from the element and then .smooth() the text:

from bs4 import BeautifulSoup html_doc = ''' Join the Adventure If you're passionate about improving the health of others and working on a big problem. Thank you ''' soup = BeautifulSoup(html_doc, 'html.parser') element = soup.select_one('._1dcffiq') for b in soup.select('b'): b.unwrap() element.smooth() print(element.get_text(strip='True', separator=' | '))

Prints:

Join the Adventure | If you're passionate about improving the health of others and working on a big problem. | Thank you

Or:

Use .find_all() with recursive=False and then join text:

text = ' | '.join(tag.text for tag in element.find_all(recursive=False)) print(text)

Prints:

Join the Adventure | If you're passionate about improving the health of others and working on a big problem. | Thank you

BeautifulSoup : do not insert a line break with soup.get_text for certain tags like <b>

Answers (1)

Related Questions

BeautifulSoup : do not insert a line break with soup.get_text for certain tags like &lt;b&gt;

Answers (1)

Related Questions

BeautifulSoup : do not insert a line break with soup.get_text for certain tags like <b>