How to get beautiful soup get_text() to consider line spacing for paragraph tags

Question

I am trying to convert html to text. Using BeautifulSoup library. However, it doesn't consider spacing (or new line) for paragraph tags

from bs4 import BeautifulSoup
test_input = 'this is sentence 1
this is sentence 2'
soup = BeautifulSoup(test_input, 'html.parser')
print(soup.get_text())

Output: this is sentence 1this is sentence 2

Expectation: this is sentence 1 this is sentence 2

Need help with understanding if BeautifulSoup can somehow handle that or there is any alternative library that could be used?

Akash senta · Accepted Answer

You can do as mentioned below

from bs4 import BeautifulSoup
test_input = 'this is sentence 1
this is sentence 2'
soup = BeautifulSoup(test_input, 'html.parser')
data = soup.find_all('p')
output = " ".join([p1.text for p1 in data])

output will be

this is sentence 1 this is sentence 2

if you want it in new line just change this line

output = "
".join([p1.text for p1 in data])

and output will be

this is sentence 1 
this is sentence 2

How to get beautiful soup get_text() to consider line spacing for paragraph tags

Answers (1)

Related Questions