hasi90
hasi90

Reputation: 84

How to select html inner most child value using xpath

I have a html structure like below:

<tr>
<td> AAA </td>
</tr>
<tr>
<td><a> BBB </a></td>
</tr>

//more rows like same as above...

How to select the values inside <td> tags? I want a list something like ['AAA', 'BBB', ...]

I tired with below query. But it fails to extract the vale of second table row as tag is present.

//table//td[1]/text()

Can anyone suggest more generic xpath query to capture values of all the <td> entries?

Thanks

Upvotes: 0

Views: 30

Answers (1)

GiovaniSalazar
GiovaniSalazar

Reputation: 2084

I'm using BeautifulSoup for parse your html , for install BeautifulSoup just make this : pip install beautifulsoup4

from bs4 import BeautifulSoup

html_string = """
<table>
  <thead>
    <tr>
      <th>Programming Language</th>
      <th>Creator</th>
      <th>Year</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td><a> BBB </a></td>
      <td>Dennis Ritchie</td>
      <td>1972</td>
    </tr>
    <tr>
      <td>Python</td>
      <td>Guido Van Rossum</td>
      <td>1989</td>
    </tr>
    <tr>
      <td>Ruby</td>
      <td>Yukihiro Matsumoto</td>
      <td>1995</td>
    </tr>
  </tbody>
</table>
"""
my_list = []
soup = BeautifulSoup(html_string, "html.parser")
samples = soup.find_all("td")

for row in samples:
    print(row.get_text())
    my_list.append(row.get_text())

print(my_list)

Upvotes: 1

Related Questions