How to get array of items using xpath in python scrapy?

Question

I need to fetch and than parse array of div's from html page. I wrote this:

def parse_public(self, response):
    hxs = Selector(response)
    posts = hxs.xpath("//*div[matches(@id, 'wall-28701979_\d{5}')")
    # or something like this
    # posts = hxs.findall("//div[starts-with(@id,'wall-28701979_')")
    print posts

The full xpath is: //*[@id="wall-28701979_XXXXX"]/div[2]/div[1]/text() where XXXXX - random 5 digits. So I need to get all elements like this from page. But I got an exceptions.ValueError: Invalid XPath: . How can I fix it? Thanks

alecxe · Accepted Answer

matches() is available only in xpath 2.0. Scrapy (well, lxml) supports only xpath 1.0.

You are also missing the closing ], but is it not really important here.

Instead, you can use starts-with():

hxs.xpath("//div[starts-with(@id, 'wall-28701979_')]")

Or, alternatively, you can use re:test. Demo from the scrapy shell:

$ cat index.html

    test1
    test2
    test3

$ scrapy shell index.html
>>> response.xpath('//div[re:test(@id, "wall-28701979_\d{5}")]/text()').extract()
[u'test1', u'test2', u'test3']

How to get array of items using xpath in python scrapy?

Answers (1)

Related Questions