Reputation: 487
I am looking for piece of advice as I am newbie to python.
Let's imagine that I have multiple data blocks similar to following one:
<td>
<a href="address.com" title=title">some title</a>
<br />
aaa<br />
bbb<br />
ccc</td>
Sometimes number of br differs and is not constant for all blocks.
My purpose is to extract data from inside td block to file however I stuck here.
Is it regular expression here the best approach?
Thank you in advance.
Upvotes: 0
Views: 77
Reputation: 298156
Parse the HTML with a HTML parser like BeautifulSoup (pip install beautifulsoup4
):
from bs4 import BeautifulSoup
html = """
<td> <a href="address.com" title=title">some title</a> <br /> aaa<br /> bbb<br /> ccc</td>
"""
soup = BeautifulSoup(html)
for td in soup.find_all('td'):
print(td.get_text())
And the result:
some title aaa bbb ccc
Upvotes: 5