axs203dd
axs203dd

Reputation: 25

Retrieve information from table rows using xpath in scrapy

I'm trying to use scrapy in order to exrtact information from an html table and store them into database. The information is stored into rows and there is no way to distinct one record from the other. (the site I'm crawling is the http://www.ets.gr/frontoffice/portal.asp?cpage=NODE&cnode=12).

How can I loop to every row of the table and get information if the form of:

Record1: tr[1] and tr[2] (skip tr[3])
Record2: tr[4] and tr[5] (skip tr[6])
Record3: tr[7] and tr[8] (skip tr[9])
and so...?

The nodes I'm getting in order to loop for each one are:
nodes = hxs.xpath("//table/tr/td/table/tr/td/table/tr/td/table/tr")

Upvotes: 0

Views: 605

Answers (1)

Jens Erat
Jens Erat

Reputation: 38732

Constructing these results is not possible using XPath 1.0 (and that's all scrapy supports), you will have to use Python code for that (after pulling the information using XPath).

If you want to omit the third/sixth/... row from the start, use position() and modulo:

//table/tr/td/table/tr/td/table/tr/td/table/tr[(position() mod 3) != 0]

Alternatively, use the @valign attribute like metaphy proposed:

//table/tr/td/table/tr/td/table/tr/td/table/tr[@valign = 'top']

Upvotes: 2

Related Questions