yesyouken
yesyouken

Reputation: 410

Having trouble accessing xpath attribute with scrapy

I am currently trying to scrape the following url: http://www.bedbathandbeyond.com/store/product/dyson-dc59-motorhead-cordless-vacuum/1042997979?categoryId=10562

On this page, I want to extract the number of reviews listed. That is, I want to extract the number 693.

This is my current xpath:

sel.xpath('//*[@id="BVRRRatingSummaryLinkReadID"]/a/span/span')

It seems to be only returning an empty array, can someone suggest a correct xpath?

Upvotes: 1

Views: 230

Answers (2)

alecxe
alecxe

Reputation: 474201

There are no reviews on the initial page you are getting with Scrapy. The problem is that the reviews are loaded and constructed via the heavy use of javascript which makes things more complicated.

Basically, your options are:

Here is a working example of the low-level approach involving parsing of a javascript code with json and slimit, extracting HTML from it and parsing it via BeautifulSoup:

import json

from bs4 import BeautifulSoup
import requests
from slimit import ast
from slimit.parser import Parser
from slimit.visitors import nodevisitor

ID = 1042997979

url = 'http://bedbathandbeyond.ugc.bazaarvoice.com/2009-en_us/{id}/reviews.djs?format=embeddedhtml&sort=submissionTime'.format(id=ID)

response = requests.get(url)

parser = Parser()
tree = parser.parse(response.content)
data = ""
for node in nodevisitor.visit(tree):
    if isinstance(node, ast.Object):
        data = json.loads(node.to_ecma())
        if "BVRRSourceID" in data:
            break

soup = BeautifulSoup(data['BVRRSourceID'])
print soup.select('span.BVRRCount span.BVRRNumber')[0].text

Prints 693.

To adapt the solution to Scrapy, you would need to make a request with Scrapy instead of requests, and parse the HTML with Scrapy instead of BeautifulSoup.

Upvotes: 4

fwu
fwu

Reputation: 369

You cannot do that. If you merely crawled the html from this url, you won't find any string of 693. This content must be created dynamically by some AJAX code.

Upvotes: 0

Related Questions