Dan
Dan

Reputation: 45762

Extract class name in scrapy

I am trying to scrape rating off of trustpilot.com.

Is it possible to extract a class name using scrapy? I am trying to scrape a rating which is made up of five individual images but the images are in a class with the name of the rating for example if the rating is 2 starts then:

<div class="star-rating count-2 size-medium clearfix">...

if it is 3 stars then:

<div class="star-rating count-3 size-medium clearfix">...

So is there a way I can scrape the class count-2 or count-3 assuming a selector like .css('.star-rating')?

Upvotes: 7

Views: 15815

Answers (3)

KaizenClaus
KaizenClaus

Reputation: 1

I had a similar question. Using scrapy v1.5.1 I could extract attributes of elements by name. Here is an example used on Lowes; I did the same with the class attribute

    for product in response.css('ul.product-cards-grid li.product-wrapper'):
        prod_href = p.css('li::attr(data-producturl)').extract()
        prod_name = p.css('li::attr(data-producttitle)').extract_first()
        prod_img  = p.css('li::attr(data-productimg)').extract_first()
        prod_id   = p.css('li::attr(data-productid)').extract_first()

Upvotes: -2

gangabass
gangabass

Reputation: 10666

You can extract rating directly using re_first() and re():

for rating in response.xpath('//div[contains(@class, "star-rating")]/@class').re(r'count-(\d+)'):
    print(rating)

Upvotes: 4

Jan
Jan

Reputation: 43199

You could use a combination of both somewhere in your code:

import re

classes = response.css('.star-rating').xpath("@class").extract()
for cls in classes:
    match = re.search(r'\bcount-\d+\b', cls)
    if match:
        print("Class = {}".format(match.group(0))

Upvotes: 7

Related Questions