Lormitto
Lormitto

Reputation: 487

How to convert html data using python

I am looking for piece of advice as I am newbie to python.

Let's imagine that I have multiple data blocks similar to following one:

<td> <a href="address.com" title=title">some title</a> <br /> aaa<br /> bbb<br /> ccc</td>

Sometimes number of br differs and is not constant for all blocks.

My purpose is to extract data from inside td block to file however I stuck here.

Is it regular expression here the best approach?

Thank you in advance.

Upvotes: 0

Views: 77

Answers (1)

Blender
Blender

Reputation: 298156

Parse the HTML with a HTML parser like BeautifulSoup (pip install beautifulsoup4):

from bs4 import BeautifulSoup

html = """
<td> <a href="address.com" title=title">some title</a> <br /> aaa<br /> bbb<br /> ccc</td>
"""

soup = BeautifulSoup(html)

for td in soup.find_all('td'):
    print(td.get_text())

And the result:

 some title  aaa bbb ccc

Upvotes: 5

Related Questions