Reputation: 563
So I am trying to scrape an Ubereats page. The url is:
https://www.ubereats.com/ann-arbor/food-delivery/chipotle-mexican-grill-3354-washtenaw-ave-ste-a/zbEbQIdWT2-n6iTWqjz55Q
So i am using the requests library. I know the page exists because I can visit the link. But the script returns a 404 error. Solutions online say we should include headers, I have to no avail either.
Here is my code:
from bs4 import BeautifulSoup
import requests
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.13; rv:63.0) Gecko/20100101 Firefox/63.0'}
page = requests.get('https://www.ubereats.com/ann-arbor/food-delivery/chipotle-mexican-grill-3354-washtenaw-ave-ste-a/zbEbQIdWT2-n6iTWqjz55Q', headers=headers )
print(page.content)
What am i missing or doing wrong?
Upvotes: 1
Views: 585
Reputation: 142641
Real web browser sends many different values in headers - not only User-Agent
.
Many servers check only User-Agent
to send correct HTML for desktop or mobile device. But some servers may check other headers.
This page needs header Accept
but code work even without User-Agent
import requests
headers = {
# 'User-Agent': 'Mozilla/5.0',
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8",
# "Accept-Encoding": "gzip, deflate, br"
# "Accept-Language": "en-US;q=0.7,en;q=0.3"
# ... other headers ...
}
url = 'https://www.ubereats.com/ann-arbor/food-delivery/chipotle-mexican-grill-3354-washtenaw-ave-ste-a/zbEbQIdWT2-n6iTWqjz55Q'
page = requests.get(url, headers=headers)
print(page.status_code)
print(page.text)
#print(page.content)
You can use DevTools
in Firefox/Chrome (tab Network
) to see all reaquests from browser to server and all headers/data send to server. And then you can copy headers and test in your code.
Upvotes: 3