Reputation: 4152
I am trying to access the API returning program data at this page when you scroll down and new tiles are displayed on the screen. Looking in Chrome Tools I have found the API being called and put together the following Requests script:
import requests
session = requests.session()
url = 'https://ie.api.atom.nowtv.com/adapter-atlas/v3/query/node?slug=/entertainment/collections/all-entertainment&represent=(items[take=60](items(items[select_list=iceberg])))'
session.headers = {
'Host': 'https://www.nowtv.com',
'Connection': 'keep-alive',
'Accept': 'application/json, text/javascript, */*',
'X-Requested-With': 'XMLHttpRequest',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.119 Safari/537.36',
'Referer': 'https://www.nowtv.com',
'Accept-Encoding': 'gzip, deflate',
'Accept-Language': 'en-GB,en-US;q=0.9,en;q=0.8'
}
scraper = cloudscraper.create_scraper(sess=session)
r = scraper.get(url)
data = r.content
print(data)
session.close()
This is returning the following only:
b'<HTML><HEAD>\n<TITLE>Invalid URL</TITLE>\n</HEAD><BODY>\n<H1>Invalid URL</H1>\nThe requested URL "[no URL]", is invalid.<p>\nReference #9.3c0f0317.1608324989.5902cff\n</BODY></HTML>\n'
I assume the issue is the part at the end of the URL that is in curly brackets. I am not sure however how to handle these in a Requests call. Can anyone provide the correct syntax?
Thanks
Upvotes: 0
Views: 690
Reputation: 474151
The issue is the Host
session header value, don't set it.
That should be enough. But I've done some additional things as well:
add the X-*
headers:
session.headers.update(**{
'X-SkyOTT-Proposition': 'NOWTV',
'X-SkyOTT-Language': 'en',
'X-SkyOTT-Platform': 'PC',
'X-SkyOTT-Territory': 'GB',
'X-SkyOTT-Device': 'COMPUTER'
})
visit the main page without XHR header set and with a broader Accept
header value:
text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
I've also used params
for the GET parameters - you don't have to do it, I think. It's just cleaner:
In [33]: url = 'https://ie.api.atom.nowtv.com/adapter-atlas/v3/query/node'
In [34]: response = session.get(url, params={
'slug': '/entertainment/collections/all-entertainment',
'represent': '(items[take=60,skip=2340](items(items[select_list=iceberg])))'
}, headers={
'Accept': 'application/json, text/plain, */*',
'X-Requested-With':'XMLHttpRequest'
})
In [35]: response
Out[35]: <Response [200]>
In [36]: response.text
Out[36]: '{"links":{"self":"/adapter-atlas/v3/query/node/e5b0e516-2b84-11e9-b860-83982be1b6a6"},"id":"e5b0e516-2b84-11e9-b860-83982be1b6a6","type":"CATALOGUE/COLLECTION","segmentId":"","segmentName":"default","childTypes":{"next_items":{"nodeTypes":["ASSET/PROGRAMME","CATALOGUE/SERIES"],"count":68},"items":{"nodeTypes":["ASSET/PROGRAMME","CATALOGUE/SERIES"],"count":2376},"curation-config":{"nodeTypes":["CATALOGUE/CURATIONCONFIG"],"count":1}},"attributes":{"childNodeTyp
...
Upvotes: 1