Thilina De Silva
Thilina De Silva

Reputation: 391

Retrieve Amazon Reviews for a particular product

I'm currently working on a research project which need to analyze reviews of a particular product and get an overall idea about the product.

I heard that amazon is a good place to get product reviews/comments. Is there any way to retrieve those user reviews/comments from Amazon via an API?? I tried several python codes but it doesn't work.. Do i need to write a spider if there is no API to retrieve data?

Are there any approaches/places to retrieve user reviews for a given product?

Upvotes: 1

Views: 3624

Answers (3)

A.M.
A.M.

Reputation: 11

If you need to scrape regularly and go through several product pages, I would suggest to do an addition to the scape hero script and use https://pypi.org/project/fake-useragent/ for the "headers" in request.

Otherwise if it's just once in a while that you need to download the reviews you can use https://reviewi.me . It's a free, web bases tool, that works for several Amazon websites and allows to export in CSV, XLSX and JSON

Upvotes: 1

Dubliyu
Dubliyu

Reputation: 61

www.Scrapehero.com has a great tutorial on how to scrape Amazon product details: How To Scrape Amazon Product Details and Pricing using Python

The complete plain text code they use is ... Products are identified by their ASIN so change the array values to the products you are interested in watching.

from lxml import html  
import csv,os,json
import requests
from exceptions import ValueError
from time import sleep

def AmzonParser(url):
    headers = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64)    AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.90 Safari/537.36'}
page = requests.get(url,headers=headers)
while True:
    sleep(3)
    try:
        doc = html.fromstring(page.content)
        XPATH_NAME = '//h1[@id="title"]//text()'
        XPATH_SALE_PRICE = '//span[contains(@id,"ourprice") or contains(@id,"saleprice")]/text()'
        XPATH_ORIGINAL_PRICE = '//td[contains(text(),"List Price") or contains(text(),"M.R.P") or contains(text(),"Price")]/following-sibling::td/text()'
        XPATH_CATEGORY = '//a[@class="a-link-normal a-color-tertiary"]//text()'
        XPATH_AVAILABILITY = '//div[@id="availability"]//text()'

        RAW_NAME = doc.xpath(XPATH_NAME)
        RAW_SALE_PRICE = doc.xpath(XPATH_SALE_PRICE)
        RAW_CATEGORY = doc.xpath(XPATH_CATEGORY)
        RAW_ORIGINAL_PRICE = doc.xpath(XPATH_ORIGINAL_PRICE)
        RAw_AVAILABILITY = doc.xpath(XPATH_AVAILABILITY)

        NAME = ' '.join(''.join(RAW_NAME).split()) if RAW_NAME else None
        SALE_PRICE = ' '.join(''.join(RAW_SALE_PRICE).split()).strip() if RAW_SALE_PRICE else None
        CATEGORY = ' > '.join([i.strip() for i in RAW_CATEGORY]) if RAW_CATEGORY else None
        ORIGINAL_PRICE = ''.join(RAW_ORIGINAL_PRICE).strip() if RAW_ORIGINAL_PRICE else None
        AVAILABILITY = ''.join(RAw_AVAILABILITY).strip() if RAw_AVAILABILITY else None

        if not ORIGINAL_PRICE:
            ORIGINAL_PRICE = SALE_PRICE

        if page.status_code!=200:
            raise ValueError('captha')
        data = {
                'NAME':NAME,
                'SALE_PRICE':SALE_PRICE,
                'CATEGORY':CATEGORY,
                'ORIGINAL_PRICE':ORIGINAL_PRICE,
                'AVAILABILITY':AVAILABILITY,
                'URL':url,
                }

        return data
    except Exception as e:
        print e

def ReadAsin():
# AsinList = csv.DictReader(open(os.path.join(os.path.dirname(__file__),"Asinfeed.csv")))
AsinList = ['B0046UR4F4',
'B00JGTVU5A',
'B00GJYCIVK',
'B00EPGK7CQ',
'B00EPGKA4G',
'B00YW5DLB4',
'B00KGD0628',
'B00O9A48N2',
'B00O9A4MEW',
'B00UZKG8QU',]
extracted_data = []
for i in AsinList:
    url = "http://www.amazon.com/dp/"+i
    print "Processing: "+url
    extracted_data.append(AmzonParser(url))
    sleep(5)
f=open('data.json','w')
json.dump(extracted_data,f,indent=4)


if __name__ == "__main__":
ReadAsin()

Upvotes: 1

Ethan Peters
Ethan Peters

Reputation: 54

Beautiful Soup is a Python API that will allow you to rip the html data from a page, like an Amazon product page, and parse through the file. This API should allow for the review sections to be taken directly from the page. Here is a link to the documentation: https://www.crummy.com/software/BeautifulSoup/bs4/doc/

Upvotes: -1

Related Questions