Scrapy response.css /xpath with broken HTML. Any tips?

Question

I am still learning scrapy and am trying to scrape some information from this page: Schlotzskys store

However, after parsing the page with scrapy through the scrapy shell I run into some issues specifically with parsing the address on the site.

First I run the following in the shell:

pipenv run scrapy shell https://www.schlotzskys.com/find-your-schlotzskys/arkansas/fayetteville/2146/

All turns out well with this. Then I make an attempt at scraping the address. I tried the following ways:

response.css('div.col-xs-12 col-sm-6 col-md-6')
response.css('div.container locations-mid-container')
response.xpath('//div[@class="locations-info"]')
response.css('div.locations-address')

The first two inputs above return:

[]

The second two inputs return:

Selector xpath="descendant-or-self::div[@class and contains(concat(' ', normalize-space(@class), ' '), ' locations-address ')]/text()" data=' 131 N. McPherson Church Rd. '

or a variant of that.

Now I looked at the HTML from:

print(response.text)

The HTML I am interested in does show up, but just does not seem to parse in scrapy. It seems it might be broken HTML, I am wondering if there is any way around this?

I appreciate anybodies help very very much!

Scrapy response.css /xpath with broken HTML. Any tips?

Answers (1)

Related Questions