Difender
Difender

Reputation: 525

Xpath parsing the whole page when i specify not to

I'm parsing websites using python and XPath.

What I'm trying to do is to extract the href from the <a>

So here's how is the XML (page):

<div id="post">
  <div align="center">
    <table>
      <tbody>
        <tr>
          <td>
          <td>
            <a href="test01">
        <tr>
          <td>
        <tr>
          <td>
  <div align="center">
    <table>
      <tbody>
        <tr>
          <td>
          <td>
            <a href="test01">
        <tr>
          <td>
        <tr>
          <td>

And here's the code I did:

posts = page.xpath("//div[@id='posts']/div[@align='center']")
for post in posts :
  print post.xpath("//table/tr[1]/td[2]/a/@href")

But the problem is that I end up with every href of posts and not the single one from post

What am I doing wrong ?

Upvotes: 1

Views: 50

Answers (1)

Keith Hall
Keith Hall

Reputation: 16085

An XPath starting with a / character means that it will be begin at the document root node. To create a relative XPath from the context node, you need to put a . before the /.

So your code should be:

posts = page.xpath("//div[@id='posts']/div[@align='center']")
for post in posts:
  print post.xpath(".//table/tr[1]/td[2]/a/@href")

Upvotes: 1

Related Questions