Reputation: 3015
I need to fetch and than parse array of div's from html page. I wrote this:
def parse_public(self, response):
hxs = Selector(response)
posts = hxs.xpath("//*div[matches(@id, 'wall-28701979_\d{5}')")
# or something like this
# posts = hxs.findall("//div[starts-with(@id,'wall-28701979_')")
print posts
The full xpath is: //*[@id="wall-28701979_XXXXX"]/div[2]/div[1]/text()
where XXXXX - random 5 digits. So I need to get all elements like this from page. But I got an exceptions.ValueError: Invalid XPath:
. How can I fix it? Thanks
Upvotes: 0
Views: 1161
Reputation: 474001
matches()
is available only in xpath 2.0
. Scrapy
(well, lxml
) supports only xpath 1.0
.
You are also missing the closing ]
, but is it not really important here.
Instead, you can use starts-with()
:
hxs.xpath("//div[starts-with(@id, 'wall-28701979_')]")
Or, alternatively, you can use re:test
. Demo from the scrapy shell
:
$ cat index.html
<div>
<div id="wall-28701979_12345">test1</div>
<div id="wall-28701979_21231">test2</div>
<div id="wall-28701979_31233">test3</div>
</div>
$ scrapy shell index.html
>>> response.xpath('//div[re:test(@id, "wall-28701979_\d{5}")]/text()').extract()
[u'test1', u'test2', u'test3']
Upvotes: 1