Volatil3
Volatil3

Reputation: 14988

Amazon Scraping returns 503

I am using BeautifulSoup and using UserAgent while making request. Amazon is blocking my calls though I did add a sleep method to avoid it. Is there anyway to deal with it? I know there's an API available but I doubt I would get what I am looking for.

What I want to get product details based on ASIN and it returns all price offers by different sellers using Amazon Prime shipping option, the example URL given here.

Upvotes: 6

Views: 6339

Answers (4)

g.hagmt
g.hagmt

Reputation: 73

I had the same issue. For anyone out there who doesn't wish to deal with APIs, creating developer accounts and such, or in case there isn't an API if you're working with something other than Amazon, I believe selenium is the easiest and most reliable solution. You need to download chromedriver.exe or whatever browser you decide to use, and provide its location via executable_path argument (I personally just put it in the same folder). Here's a quick example:

from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver import Chrome

def get_webdriver():
  chrome_options = webdriver.ChromeOptions()
  chrome_options.headless = False
  return webdriver.Chrome(executable_path='chromedriver.exe', chrome_options=chrome_options, service_args=['hide_console'])

def get_soup(url, driver):
  driver.get(url)
  return BeautifulSoup(driver.page_source, "html.parser")

driver = get_webdriver()
soup = get_soup("https://amazon.com", driver)
driver.quit()

Upvotes: 0

riemann_lebesgue
riemann_lebesgue

Reputation: 329

If you are scraping amazon search results page,e.g. this , then amazon only requires you have a user-agent set, and don't scrape too aggressively (i.e. it will block/rate limit you if you send too many requests from one IP address)

If you are scraping a product page, e.g. this, the rate limits are much stricter per IP.

If your use case is not very intense the amazon seller api is your best bet, and only requires a professional selling account to use, see https://developer-docs.amazon.com/sp-api/ typically has rate limits of 1-10 per second for data like prices, barcodes etc

Upvotes: 2

amalu
amalu

Reputation: 11

amazon has bots and all to prevent the web scrappers and scrapping automation. just search for your user agent and provide it as a dictionary when using the get method url= "<link>" header = { "User-Agent": "<search result of user agent>", } page=requests.get(url,headers=header)

Upvotes: 1

Michael - sqlbot
Michael - sqlbot

Reputation: 179134

Is there anyway to deal with it?

Yes... you comply with their acceptable use policy.

If it's not available from an API, you're not authorized to scrape it.

Even if you successfully scrape it, you're still not authorized to use it.

This license does not include any resale or commercial use of any Amazon Service, or its contents; any collection and use of any product listings, descriptions, or prices; any derivative use of any Amazon Service or its contents; any downloading, copying, or other use of account information for the benefit of any third party; or any use of data mining, robots, or similar data gathering and extraction tools.

https://www.amazon.com/gp/help/customer/display.html/ref=ap_frn_condition_of_use?ie=UTF8&nodeId=508088

Upvotes: 2

Related Questions