Invalid URL when using Python Requests

Question

I am trying to access the API returning program data at this page when you scroll down and new tiles are displayed on the screen. Looking in Chrome Tools I have found the API being called and put together the following Requests script:

import requests

session = requests.session()

url = 'https://ie.api.atom.nowtv.com/adapter-atlas/v3/query/node?slug=/entertainment/collections/all-entertainment&represent=(items[take=60](items(items[select_list=iceberg])))'

session.headers = {
'Host': 'https://www.nowtv.com',
'Connection': 'keep-alive',
'Accept': 'application/json, text/javascript, */*',
'X-Requested-With': 'XMLHttpRequest',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.119 Safari/537.36',
'Referer': 'https://www.nowtv.com',
'Accept-Encoding': 'gzip, deflate',
'Accept-Language': 'en-GB,en-US;q=0.9,en;q=0.8'
}

scraper = cloudscraper.create_scraper(sess=session)
r = scraper.get(url)

data = r.content
print(data)

session.close()

This is returning the following only:

b' Invalid URL Invalid URL The requested URL "[no URL]", is invalid.

Reference #9.3c0f0317.1608324989.5902cff '

I assume the issue is the part at the end of the URL that is in curly brackets. I am not sure however how to handle these in a Requests call. Can anyone provide the correct syntax?

Thanks

alecxe · Accepted Answer

The issue is the Host session header value, don't set it.

That should be enough. But I've done some additional things as well:

add the X-* headers:

session.headers.update(**{
    'X-SkyOTT-Proposition': 'NOWTV',
    'X-SkyOTT-Language': 'en',
    'X-SkyOTT-Platform': 'PC',
    'X-SkyOTT-Territory': 'GB',
    'X-SkyOTT-Device': 'COMPUTER'
})

visit the main page without XHR header set and with a broader Accept header value:
```
text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8 
```

I've also used params for the GET parameters - you don't have to do it, I think. It's just cleaner:

 In [33]: url = 'https://ie.api.atom.nowtv.com/adapter-atlas/v3/query/node'

 In [34]: response = session.get(url, params={
              'slug': '/entertainment/collections/all-entertainment', 
              'represent': '(items[take=60,skip=2340](items(items[select_list=iceberg])))'
          }, headers={
              'Accept': 'application/json, text/plain, */*', 
              'X-Requested-With':'XMLHttpRequest'
          })

 In [35]: response
 Out[35]: 

 In [36]: response.text
 Out[36]: '{"links":{"self":"/adapter-atlas/v3/query/node/e5b0e516-2b84-11e9-b860-83982be1b6a6"},"id":"e5b0e516-2b84-11e9-b860-83982be1b6a6","type":"CATALOGUE/COLLECTION","segmentId":"","segmentName":"default","childTypes":{"next_items":{"nodeTypes":["ASSET/PROGRAMME","CATALOGUE/SERIES"],"count":68},"items":{"nodeTypes":["ASSET/PROGRAMME","CATALOGUE/SERIES"],"count":2376},"curation-config":{"nodeTypes":["CATALOGUE/CURATIONCONFIG"],"count":1}},"attributes":{"childNodeTyp
           ...

Invalid URL when using Python Requests

Answers (1)

Related Questions