Retrieve information from table rows using xpath in scrapy

Question

I'm trying to use scrapy in order to exrtact information from an html table and store them into database. The information is stored into rows and there is no way to distinct one record from the other. (the site I'm crawling is the http://www.ets.gr/frontoffice/portal.asp?cpage=NODE&cnode=12).

How can I loop to every row of the table and get information if the form of:

Record1: tr[1] and tr[2] (skip tr[3])
Record2: tr[4] and tr[5] (skip tr[6])
Record3: tr[7] and tr[8] (skip tr[9])
and so...?

The nodes I'm getting in order to loop for each one are:
nodes = hxs.xpath("//table/tr/td/table/tr/td/table/tr/td/table/tr")

Jens Erat · Accepted Answer

Constructing these results is not possible using XPath 1.0 (and that's all scrapy supports), you will have to use Python code for that (after pulling the information using XPath).

If you want to omit the third/sixth/... row from the start, use position() and modulo:

//table/tr/td/table/tr/td/table/tr/td/table/tr[(position() mod 3) != 0]

Alternatively, use the @valign attribute like metaphy proposed:

//table/tr/td/table/tr/td/table/tr/td/table/tr[@valign = 'top']

Retrieve information from table rows using xpath in scrapy

Answers (1)

Related Questions