Tendekai Muchenje
Tendekai Muchenje

Reputation: 563

'requests' scrape of URL returns 404 even with headers and page definitely exists

So I am trying to scrape an Ubereats page. The url is: https://www.ubereats.com/ann-arbor/food-delivery/chipotle-mexican-grill-3354-washtenaw-ave-ste-a/zbEbQIdWT2-n6iTWqjz55Q

So i am using the requests library. I know the page exists because I can visit the link. But the script returns a 404 error. Solutions online say we should include headers, I have to no avail either.

Here is my code:

from bs4 import BeautifulSoup
import requests

headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.13; rv:63.0) Gecko/20100101 Firefox/63.0'}
page = requests.get('https://www.ubereats.com/ann-arbor/food-delivery/chipotle-mexican-grill-3354-washtenaw-ave-ste-a/zbEbQIdWT2-n6iTWqjz55Q', headers=headers  )

print(page.content)

What am i missing or doing wrong?

Upvotes: 1

Views: 585

Answers (1)

furas
furas

Reputation: 142641

Real web browser sends many different values in headers - not only User-Agent.

Many servers check only User-Agent to send correct HTML for desktop or mobile device. But some servers may check other headers.

This page needs header Accept but code work even without User-Agent

import requests

headers = {
#    'User-Agent': 'Mozilla/5.0',
    "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8", 
#    "Accept-Encoding": "gzip, deflate, br"
#    "Accept-Language": "en-US;q=0.7,en;q=0.3"
# ... other headers ...
}

url = 'https://www.ubereats.com/ann-arbor/food-delivery/chipotle-mexican-grill-3354-washtenaw-ave-ste-a/zbEbQIdWT2-n6iTWqjz55Q'
page = requests.get(url, headers=headers)

print(page.status_code)
print(page.text) 
#print(page.content)

You can use DevTools in Firefox/Chrome (tab Network) to see all reaquests from browser to server and all headers/data send to server. And then you can copy headers and test in your code.

Upvotes: 3

Related Questions