Reputation: 53
This is the source code from www.example, com that I want to extract. If anyone can explain what I need to extract.
<table>
<tr>
<td colspan="5" style="text-align:left;padding-left:4px;" class="category"><img-src="http://www.example.com/images/menu.gif">TEXT in td 1 </td>
</tr>
<tr>
<td class="date" colspan="5">TEXT in td 2</td>
</tr>
<tr>
<td style="test-align:left;width:40px;">TEXT in td 3</td>
<td style="padding-right:4px; width:180px;text-align:right">TEXT in td 4</td>
<td style="width:40px;text-align:center"> TEXT in td 5</td>
<td style="padding-left:5px; width:180px;text-align:left">TEXT in td 6</td>
<td style="width:40px;text-align:center"></td>
</tr>
</table>
This is my code that I want to extrract. I want to extract each separate text, text 4, 5 and 6 am drawn well. Text 1, 2 and 3 if anyone can tell me how can I extract. Thanks in advance!
item['TEXT in td 1'] = app.select('//td[2]//text()').extract()
item['TEXT in td 2'] = app.select('//td[3]/text()').extract()
item['TEXT in td 3'] = app.select('td[4]/text()').extract()
item['TEXT in td 5'] = app.select('td[3]//text()').extract()
item['TEXT in td 4'] = app.select('td[2]/text()').extract()
item['TEXT in td 6'] = app.select('td[4]/text()').extract()
This a extract Scrapy:
2013-08-04 11:27:11+0300 [app] DEBUG: Scraped from <200 />
{'TEXT in td 1': [u'', u'TEXT in td 1'],
'TEXT in td 2': [u'August 04'],
'TEXT in td 6': [],
'TEXT in td 5': [],
'TEXT in td 4': [],
'TEXT in td 6': []}
2013-08-04 11:27:11+0300 [app] DEBUG: Scraped from <200 />
{'TEXT in td 1': [u'', u'TEXT in td 1'],
'TEXT in td 2': [u'August 04'],
'TEXT in td 6': [u'TEXT in td 6'],
'TEXT in td 5': [u'TEXT in td 5'],
'TEXT in td 4': [u'TEXT in td 4'],
'TEXT in td 6': [u'TEXT in td 6']}
Upvotes: 0
Views: 180
Reputation: 4767
This probably would be done as follows ( I don't have scrapy, but there is a problem with your Xpaths)
item['TEXT in td 1'] = app.select('//table/tr[1]/td[1]//text()').extract()
item['TEXT in td 2'] = app.select('//table/tr[1]/td[2]/text()').extract()
item['TEXT in td 3'] = app.select('//table/tr[2]/td[1]/text()').extract()
item['TEXT in td 5'] = app.select('//table/tr[2]/td[2]/text()').extract()
item['TEXT in td 4'] = app.select('//table/tr[3]/td[1]/text()').extract()
item['TEXT in td 6'] = app.select('//table/tr[3]/td[2]/text()').extract()
What we are doing is (assuming a single table) we are fetching each row of the table (observe tr[1], tr[2] etc. and then accessing the cells within these rows observe td[1], td[2] etc...
Upvotes: 1