BeautifulSoup - getting rid of paragraph whitespace/line breaks

Question

similarlist = res.find_all_next("div", class_="result-wrapper")
for item in similarlist:
    print(item)

This returns:






Aa machen

   


to do a poo, to pooh

When I choose to print item.get_text() instead, I get

abgeneigt machen
to disincline




abhängig machen
2137

to predicate




Absenker machen
to layer

So basically a lot of new lines between the list items that I don't need. Is this because of the

tags? How do I get rid of them?

Martijn Pieters · Accepted Answer

Yes, between tags the HTML contains whitespace (including newlines) too.

You can easily collapse all multi-line whitespace with a regular expression:

import re

re.sub(r'
\s*
', r'

', item.get_text().strip(), flags=re.M)

This removes any whitespace (newlines, spaces, tabs, etc.) between two newlines.

Answers (2)