pradhox
pradhox

Reputation: 75

Scrapy extract table from website

I am a Python novice and am trying to write a script to extract the data from this page. Using scrapy, I wrote the following code:

import scrapy

class dairySpider(scrapy.Spider):
    name = "dairy_price"

    def start_requests(self):
        urls = [
            'http://www.dairy.com/market-prices/?page=quote&sym=DAH15&mode=i',

        ]
        for url in urls:
            yield scrapy.Request(url=url, callback=self.parse)




    def parse(self, response):
        for rows in response.xpath("//tr"):
            yield {
                'text': rows.xpath(".//td/text()").extract().strip('. \n'),

                }

However, this didn't scrape anything. Do you have any ideas ? Thanks

Upvotes: 1

Views: 2578

Answers (1)

Faisal Umair
Faisal Umair

Reputation: 391

The table on the page http://www.dairy.com/market-prices/?page=quote&sym=DAH15&mode=i is being dynamically added to the DOM by making request to http://shared.websol.barchart.com/quotes/quote.php?page=quote&sym=DAH15&mode=i&domain=blimling&display_ice=&enabled_ice_exchanges=&tz=0&ed=0.

You should be scrapping the second link instead of first. As scrapy.Request will only return html source code and not the content added using javascript.

UPDATE

Here is the working code for extracting table data

import scrapy

class dairySpider(scrapy.Spider):
    name = "dairy_price"

    def start_requests(self):
        urls = [
            "http://shared.websol.barchart.com/quotes/quote.php?page=quote&sym=DAH15&mode=i&domain=blimling&display_ice=&enabled_ice_exchanges=&tz=0&ed=0",
        ]

        for url in urls:
            yield scrapy.Request(url=url, callback=self.parse)


    def parse(self, response):
        for row in response.css(".bcQuoteTable tbody tr"):
            print row.xpath("td//text()").extract()

Make sure you edit your settings.py file and change ROBOTSTXT_OBEY = True to ROBOTSTXT_OBEY = False

Upvotes: 1

Related Questions