Reputation: 45762
I am trying to scrape rating off of trustpilot.com.
Is it possible to extract a class name using scrapy? I am trying to scrape a rating which is made up of five individual images but the images are in a class with the name of the rating for example if the rating is 2 starts then:
<div class="star-rating count-2 size-medium clearfix">...
if it is 3 stars then:
<div class="star-rating count-3 size-medium clearfix">...
So is there a way I can scrape the class count-2
or count-3
assuming a selector like .css('.star-rating')
?
Upvotes: 7
Views: 15815
Reputation: 1
I had a similar question. Using scrapy v1.5.1 I could extract attributes of elements by name. Here is an example used on Lowes; I did the same with the class
attribute
for product in response.css('ul.product-cards-grid li.product-wrapper'):
prod_href = p.css('li::attr(data-producturl)').extract()
prod_name = p.css('li::attr(data-producttitle)').extract_first()
prod_img = p.css('li::attr(data-productimg)').extract_first()
prod_id = p.css('li::attr(data-productid)').extract_first()
Upvotes: -2
Reputation: 10666
You can extract rating directly using re_first()
and re()
:
for rating in response.xpath('//div[contains(@class, "star-rating")]/@class').re(r'count-(\d+)'):
print(rating)
Upvotes: 4
Reputation: 43199
You could use a combination of both somewhere in your code:
import re
classes = response.css('.star-rating').xpath("@class").extract()
for cls in classes:
match = re.search(r'\bcount-\d+\b', cls)
if match:
print("Class = {}".format(match.group(0))
Upvotes: 7