Marco Fumagalli
Marco Fumagalli

Reputation: 2477

get_text() problem managing tag inside text

I'm trying to extract data from html table and obviously I'm using BeatifulSoup

I managed to select relevant tags and organize data into a pandas df. I have one little problem I need to solve.

For example suppose i have a variable column which is an instance of bs4.element.Tag whose value is equal to:

<td>Valore di inizio<br/>esercizio</td>

When i call column.get_text() it returns:

Valore di inizioesercizio

I'd like to have back

Valore di inizio esercizio

i.e tag br should be stripped and replaced with a space.

Thanks

Upvotes: 0

Views: 76

Answers (1)

Andrej Kesely
Andrej Kesely

Reputation: 195428

You can use get_text() but with separator= parameter:

data = '''<td>Valore di inizio<br/>esercizio</td>'''

from bs4 import BeautifulSoup

soup = BeautifulSoup(data, 'html.parser')

print(soup.td.get_text(separator=' '))  # for more control, you can add strip=True parameter

Prints:

Valore di inizio esercizio

Upvotes: 2

Related Questions