Reputation: 2477
I'm trying to extract data from html table and obviously I'm using BeatifulSoup
I managed to select relevant tags and organize data into a pandas df. I have one little problem I need to solve.
For example suppose i have a variable column
which is an instance of bs4.element.Tag
whose value is equal to:
<td>Valore di inizio<br/>esercizio</td>
When i call column.get_text()
it returns:
Valore di inizioesercizio
I'd like to have back
Valore di inizio esercizio
i.e tag br
should be stripped and replaced with a space.
Thanks
Upvotes: 0
Views: 76
Reputation: 195428
You can use get_text()
but with separator=
parameter:
data = '''<td>Valore di inizio<br/>esercizio</td>'''
from bs4 import BeautifulSoup
soup = BeautifulSoup(data, 'html.parser')
print(soup.td.get_text(separator=' ')) # for more control, you can add strip=True parameter
Prints:
Valore di inizio esercizio
Upvotes: 2