WhiteDillPickle
WhiteDillPickle

Reputation: 185

CSS Selectors for Scrapy Web Scraping

I'm currently trying to scrape all the malls listed on the website

https://web.archive.org/web/20151112172204/http://www.simon.com/mall

using Python and Scrapy. I can't figure out how to extract the text "Anchorage 5th Avenue Mall".

<div class="st-country-padding">
    <h4><a class="no-underline" href="/web/20151112172204/http://www.simon.com/search/alaska%2b(ak)" title="View Malls In Alaska">Alaska</a></h4>
        <div>
            <a href="/web/20151112172204/http://www.simon.com/search/anchorage,+ak" title="Malls in Anchorage, AK">Anchorage</a>:
                <a href="http://www.simon.com/mall/anchorage-5th-avenue-mall" title="View Anchorage 5th Avenue Mall Website">Anchorage 5th Avenue Mall</a>
        </div>
</div>

I've tried a number of differnet attempts including

response.css("a::attr(title)").extract()

But doesn't give me what I'm looking for.

Note that Anchorage is just the name of the first mall so I can't call that directly because there are 200 or so different malls

Upvotes: 0

Views: 2975

Answers (1)

BoltClock
BoltClock

Reputation: 723388

::attr(title) gives you the value of the title attribute. What you want is the text, so you need to use ::text instead.

Also, there doesn't appear to be a good way to identify the a element you want since it doesn't have anything that distinguishes it from the others, so a bit of pathing is necessary. Let me know if this works for you:

response.css(".st-country-padding > div > a:last-of-type::text").extract()

Upvotes: 1

Related Questions