Reputation: 185
I'm currently trying to scrape all the malls listed on the website
https://web.archive.org/web/20151112172204/http://www.simon.com/mall
using Python and Scrapy. I can't figure out how to extract the text "Anchorage 5th Avenue Mall".
<div class="st-country-padding">
<h4><a class="no-underline" href="/web/20151112172204/http://www.simon.com/search/alaska%2b(ak)" title="View Malls In Alaska">Alaska</a></h4>
<div>
<a href="/web/20151112172204/http://www.simon.com/search/anchorage,+ak" title="Malls in Anchorage, AK">Anchorage</a>:
<a href="http://www.simon.com/mall/anchorage-5th-avenue-mall" title="View Anchorage 5th Avenue Mall Website">Anchorage 5th Avenue Mall</a>
</div>
</div>
I've tried a number of differnet attempts including
response.css("a::attr(title)").extract()
But doesn't give me what I'm looking for.
Note that Anchorage is just the name of the first mall so I can't call that directly because there are 200 or so different malls
Upvotes: 0
Views: 2975
Reputation: 723388
::attr(title)
gives you the value of the title
attribute. What you want is the text, so you need to use ::text
instead.
Also, there doesn't appear to be a good way to identify the a
element you want since it doesn't have anything that distinguishes it from the others, so a bit of pathing is necessary. Let me know if this works for you:
response.css(".st-country-padding > div > a:last-of-type::text").extract()
Upvotes: 1