get_text() problem managing tag inside text

Question

I'm trying to extract data from html table and obviously I'm using BeatifulSoup

I managed to select relevant tags and organize data into a pandas df. I have one little problem I need to solve.

For example suppose i have a variable column which is an instance of bs4.element.Tag whose value is equal to:

Valore di inizio
esercizio

When i call column.get_text() it returns:

Valore di inizioesercizio

I'd like to have back

Valore di inizio esercizio

i.e tag br should be stripped and replaced with a space.

Thanks

Andrej Kesely · Accepted Answer

You can use get_text() but with separator= parameter:

data = '''Valore di inizio
esercizio'''

from bs4 import BeautifulSoup

soup = BeautifulSoup(data, 'html.parser')

print(soup.td.get_text(separator=' '))  # for more control, you can add strip=True parameter

Prints:

Valore di inizio esercizio

get_text() problem managing tag inside text

Answers (1)

Related Questions