Oleksandr H
Oleksandr H

Reputation: 3015

How to get array of items using xpath in python scrapy?

I need to fetch and than parse array of div's from html page. I wrote this:

def parse_public(self, response):
    hxs = Selector(response)
    posts = hxs.xpath("//*div[matches(@id, 'wall-28701979_\d{5}')")
    # or something like this
    # posts = hxs.findall("//div[starts-with(@id,'wall-28701979_')")
    print posts

The full xpath is: //*[@id="wall-28701979_XXXXX"]/div[2]/div[1]/text() where XXXXX - random 5 digits. So I need to get all elements like this from page. But I got an exceptions.ValueError: Invalid XPath: . How can I fix it? Thanks

Upvotes: 0

Views: 1161

Answers (1)

alecxe
alecxe

Reputation: 474001

matches() is available only in xpath 2.0. Scrapy (well, lxml) supports only xpath 1.0.

You are also missing the closing ], but is it not really important here.


Instead, you can use starts-with():

hxs.xpath("//div[starts-with(@id, 'wall-28701979_')]")

Or, alternatively, you can use re:test. Demo from the scrapy shell:

$ cat index.html
<div>
    <div id="wall-28701979_12345">test1</div>
    <div id="wall-28701979_21231">test2</div>
    <div id="wall-28701979_31233">test3</div>
</div>
$ scrapy shell index.html
>>> response.xpath('//div[re:test(@id, "wall-28701979_\d{5}")]/text()').extract()
[u'test1', u'test2', u'test3']

Upvotes: 1

Related Questions