Reputation:
I am trying to use the python requests library to get the html from this url https://www.adidas.com/api/products/EF2302/availability?sitePath=us
However every time I run my code it hangs when making the get request
header = BASE_REQUEST_HEADER
url = 'https://www.adidas.com/api/products/EF2302/availability?sitePath=us'
r = requests.get(url, headers = header)
I checked the network tab in chrome and copied all the headers used including user agent so that is not the issue. I was also able to load the page in chrome with javascript and cookies disabled.
This code works fine with other websites. I simply cant get a response from any of the adidas websites (including https://www.adidas.com/us).
Any suggestions are greatly appreciated.
Upvotes: 2
Views: 2926
Reputation: 195528
This site doesn't like the default User-Agent field supplied by requests, change it to Firefox/Chrome (I chose Firefox in my example), and you can read data successfully:
from bs4 import BeautifulSoup
import requests
import json
headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:61.0) Gecko/20100101 Firefox/61.0'}
url = 'https://www.adidas.com/api/products/EF2302/availability?sitePath=us'
r = requests.get(url, headers=headers)
json_data = json.loads(r.text)
print(json.dumps(json_data, indent=4))
Prints:
{
"id": "EF2302",
"availability_status": "PREORDER",
"variation_list": [
{
"sku": "EF2302_530",
"availability": 15,
"availability_status": "PREORDER",
"size": "4",
"instock_date": "2018-08-16T00:00:00.000Z"
},
{
"sku": "EF2302_550",
"availability": 15,
"availability_status": "PREORDER",
"size": "5",
"instock_date": "2018-08-16T00:00:00.000Z"
},
{
"sku": "EF2302_570",
"availability": 15,
"availability_status": "PREORDER",
"size": "6",
"instock_date": "2018-08-16T00:00:00.000Z"
},
{
"sku": "EF2302_590",
"availability": 15,
"availability_status": "PREORDER",
"size": "7",
"instock_date": "2018-08-16T00:00:00.000Z"
},
{
"sku": "EF2302_610",
"availability": 15,
"availability_status": "PREORDER",
"size": "8",
"instock_date": "2018-08-16T00:00:00.000Z"
},
{
"sku": "EF2302_630",
"availability": 15,
"availability_status": "PREORDER",
"size": "9",
"instock_date": "2018-08-16T00:00:00.000Z"
},
{
"sku": "EF2302_650",
"availability": 15,
"availability_status": "PREORDER",
"size": "10",
"instock_date": "2018-08-16T00:00:00.000Z"
},
{
"sku": "EF2302_670",
"availability": 15,
"availability_status": "PREORDER",
"size": "11",
"instock_date": "2018-08-16T00:00:00.000Z"
},
{
"sku": "EF2302_690",
"availability": 15,
"availability_status": "PREORDER",
"size": "12",
"instock_date": "2018-08-16T00:00:00.000Z"
},
{
"sku": "EF2302_710",
"availability": 15,
"availability_status": "PREORDER",
"size": "13",
"instock_date": "2018-08-16T00:00:00.000Z"
}
]
}
Upvotes: 2
Reputation: 81
One different is the User-agent field, which requests sets as User-Agent: python-requests/2.18.4
Adidas may be just dropping these http requests to stop people abusing their system.
(btw, it also happens for just www.adidas.com)
I reproduced the issue and took a look at wireshark packet sniffer. It seems the http request is good and there is tcp acknowledgement but no http reply.
Upvotes: 1