Reputation: 525
I'm parsing websites using python and XPath.
What I'm trying to do is to extract the href from the <a>
So here's how is the XML (page):
<div id="post">
<div align="center">
<table>
<tbody>
<tr>
<td>
<td>
<a href="test01">
<tr>
<td>
<tr>
<td>
<div align="center">
<table>
<tbody>
<tr>
<td>
<td>
<a href="test01">
<tr>
<td>
<tr>
<td>
And here's the code I did:
posts = page.xpath("//div[@id='posts']/div[@align='center']")
for post in posts :
print post.xpath("//table/tr[1]/td[2]/a/@href")
But the problem is that I end up with every href of posts
and not the single one from post
What am I doing wrong ?
Upvotes: 1
Views: 50
Reputation: 16085
An XPath starting with a /
character means that it will be begin at the document root node. To create a relative XPath from the context node, you need to put a .
before the /
.
So your code should be:
posts = page.xpath("//div[@id='posts']/div[@align='center']")
for post in posts:
print post.xpath(".//table/tr[1]/td[2]/a/@href")
Upvotes: 1