Reputation: 177
I am still learning scrapy and am trying to scrape some information from this page: Schlotzskys store
However, after parsing the page with scrapy through the scrapy shell I run into some issues specifically with parsing the address on the site.
First I run the following in the shell:
pipenv run scrapy shell https://www.schlotzskys.com/find-your-schlotzskys/arkansas/fayetteville/2146/
All turns out well with this. Then I make an attempt at scraping the address. I tried the following ways:
response.css('div.col-xs-12 col-sm-6 col-md-6')
response.css('div.container locations-mid-container')
response.xpath('//div[@class="locations-info"]')
response.css('div.locations-address')
The first two inputs above return:
[]
The second two inputs return:
Selector xpath="descendant-or-self::div[@class and contains(concat(' ', normalize-space(@class), ' '), ' locations-address ')]/text()" data='\n\t\t\t\t\t131 N. McPherson Church Rd.\t\t\t\t'
or a variant of that.
Now I looked at the HTML from:
print(response.text)
The HTML I am interested in does show up, but just does not seem to parse in scrapy. It seems it might be broken HTML, I am wondering if there is any way around this?
I appreciate anybodies help very very much!
Upvotes: 1
Views: 532
Reputation: 10210
I couldn't find element on page by CSS selector given in first expression. All your expressions are missing the extract()
or extract_first()
method call, so you are working with Selector
s.
Try this:
address = [
response.xpath('normalize-space(//div[@class="locations-address"])').extract_first(),
response.xpath('normalize-space(//div[@class="locations-address-secondary"])').extract_first(),
response.xpath('normalize-space(//div[@class="locations-state-city-zip"])').extract_first()
]
The normalize-space()
XPath function removes the annoying whitespaces.
Upvotes: 1