V.Khakhil
V.Khakhil

Reputation: 265

Scrapy getting wrong values from request url

I am trying to extract title from This. But getting different title which is not the resposnse url's title. I am trying this-

class ElementSpider(scrapy.Spider):
    name = 'qwerty4'
    allowed_domains = ["burbank.com.au"]
    start_urls = ["https://www.burbank.com.au/victoria/home-details/alphington-153-179727", "https://www.burbank.com.au/victoria/home-details/sandringham-151-171569", "https://www.burbank.com.au/victoria/home-details/sandringham-151-181680", "https://www.burbank.com.au/victoria/home-details/bellfield-184-171585", "https://www.burbank.com.au/victoria/home-details/carlton-178-172662", "https://www.burbank.com.au/victoria/home-details/carlton-178-178079" ]

    def parse(self, response):
        title = response.xpath('//div[@class="col-md-4 col-xs-12 col-sm-12"]/div[@class="housename"]/span/text()').extract()[0]
        print response.url
        print title

and getting the wrong data for some requests. Output is- enter image description here

Please suggest how to resolve the issue.

Upvotes: 0

Views: 391

Answers (2)

Granitosaurus
Granitosaurus

Reputation: 21436

Seems like the website stores viewstate.

To get around that you either need to get rid of scrapy's concurrency by setting CONCURRENT_REQUESTS = 1.

Otherwise you need to investigate further how the viewstate is generated, it could be IP bound which could mean you need some proxies to get around this.

Upvotes: 0

bbanzzakji
bbanzzakji

Reputation: 92

They don't want their website to be scraped so added a technique for scraper to be confused.

In the settings.py change some fields.

CONCURRENT_REQUESTS = 1
DOWNLOAD_DELAY = 2

Upvotes: 1

Related Questions