Reputation: 84
I have a html structure like below:
<tr>
<td> AAA </td>
</tr>
<tr>
<td><a> BBB </a></td>
</tr>
//more rows like same as above...
How to select the values inside <td> tags? I want a list something like ['AAA', 'BBB', ...]
I tired with below query. But it fails to extract the vale of second table row as tag is present.
//table//td[1]/text()
Can anyone suggest more generic xpath query to capture values of all the <td> entries?
Thanks
Upvotes: 0
Views: 30
Reputation: 2084
I'm using BeautifulSoup for parse your html , for install BeautifulSoup just make this : pip install beautifulsoup4
from bs4 import BeautifulSoup
html_string = """
<table>
<thead>
<tr>
<th>Programming Language</th>
<th>Creator</th>
<th>Year</th>
</tr>
</thead>
<tbody>
<tr>
<td><a> BBB </a></td>
<td>Dennis Ritchie</td>
<td>1972</td>
</tr>
<tr>
<td>Python</td>
<td>Guido Van Rossum</td>
<td>1989</td>
</tr>
<tr>
<td>Ruby</td>
<td>Yukihiro Matsumoto</td>
<td>1995</td>
</tr>
</tbody>
</table>
"""
my_list = []
soup = BeautifulSoup(html_string, "html.parser")
samples = soup.find_all("td")
for row in samples:
print(row.get_text())
my_list.append(row.get_text())
print(my_list)
Upvotes: 1