Amith
Amith

Reputation: 73

Data missing while scraping website

I am trying to scrap a website (Please refer to urls in the code). From the website ,i am trying to scrap all the information and transfer the data to json file.

scrapy shell http://www.narakkalkuries.com/intimation.html

To extract the information from website

response.xpath('//table[@class="MsoTableGrid"]/tr/td[1]//text()').re(r'[0-9,-/]+|[0-9]+')

I am able to retrieve most of the information from the website.

Concern: Able to scrap data under "Intimation",expect'Intimation For September 2017' not able to scrap information under this tab.

Finding:

For 'Intimation For September 2017', the value is stored in the span tag

/html/body/div[4]/div[2]/div/table/tbody/tr[32]/td[1]/table/tbody/tr[1]/td[1]/p/b/span

For the remaining month the values are stored in the font tag

/html/body/div[4]/div[2]/div/table/tbody/tr[35]/td[1]/table/tbody/tr[2]/td[1]/p/b/span/font

How to extract information for "Intimation For September 2017" ?

Upvotes: 1

Views: 289

Answers (1)

gangabass
gangabass

Reputation: 10666

You tables use different @class (MsoTableGrid and MsoNormalTable) so you need some way to process all of them:

for table in response.xpath('//table[@width="519"]'):
    for row in table.xpath('./tr[position() > 1]'):
        for cell in row.xpath('./td'):
            #you can stringify value
            cell_value = cell.xpath('string(.)').extract_first()

Upvotes: 1

Related Questions