Reputation: 37
I have noticed that for some websites' API Urls, the return on the browser is via a service worker which has caused problems in scraping those APIs.
For consider the following:
The data appears when the url is pasted into a browser However it gives me a 422 error when I try to automate the collection of that data in Python with the following code:
import requests
#API url
url = 'https://www.sephora.co.id/api/v2.3/products?filter[category]=makeup/face/bronzer&page[size]=30&page[number]=1&sort=sales&include=variants,brand'
#The response is always 422
response = requests.get(url)
I have noticed that when calling the API url on the browser returns a response via a service worker. Therefore my questions is there a way around to get a 200 response via the python requests library?
Upvotes: 1
Views: 1520
Reputation: 2378
The server appears to require the Accept-Language
header.
The code below now returns 200.
import requests
url = 'https://www.sephora.co.id/api/v2.3/products?filter[category]=makeup/face/bronzer&page[size]=30&page[number]=1&sort=sales&include=variants,brand'
headers = {'Accept-Language': 'en-gb'}
response = requests.get(url, headers=headers)
(Ascertained by checking a successful request via a browser, adding in all headers AS IS to the python request and then removing one by one.)
Upvotes: 2