Reputation: 41
I am trying to scrape https://registry.verra.org/app/search/VCS/All%20Projects for a school project. I am trying to send a request to the "download excel" button by replicating the POST request going on in the background.
Here's what I have so far.
import requests
import datetime as dt
url_back = 'https://registry.verra.org/uiapi/resource/resource/search?$skip=0&count=true&$format=excel&$exportFileName=allprojects.xlsx'
data = {"program":"VCS",
"resourceStatuses":["VCS_EX_CRD_PRD_VER_REQUESTED","VCS_EX_CRD_PRD_REQUESTED",
"VCS_EX_REGISTERED","VCS_EX_REG_VER_APPR_REQUESTED",
"VCS_EX_REGISTRATION_REQUESTED","VCS_EX_REJ",
"VCS_EX_UNDER_DEVELOPMENT_CLD","VCS_EX_UNDER_DEVELOPMENT_OPN",
"VCS_EX_UNDER_VALIDATION_CLD","VCS_EX_UNDER_VALIDATION_OPN",
"VCS_EX_CRED_TRANS_FRM_OTHER_PROG","VCS_EX_WITHDRAWN"]}
headers = {
"Accept": "*/*",
"Accept-Encoding": "gzip, deflate, br",
"Accept-Language": "en-US,en;q=0.9",
"Connection": "keep-alive",
"Content-Length": "369",
"Content-Type": "application/json",
"Cookie": "fpestid=9g1E7EZczSniadmveW8TL8DIBB_w-MDFov_fr0DQqgBD46kgkoVSzIdQHKP-hSxMbBr4tg; _ga=GA1.2.1884498504.1652482731; _gid=GA1.2.1741997157.1652482731; ASPSESSIONIDQERRTRAR=BFIILIADNEINGJAKKMCJGKKO",
"Host": "registry.verra.org",
"Origin": "https://registry.verra.org",
"Referer": "https://registry.verra.org/app/search/VCS/All%20Projects",
"Sec-Fetch-Dest": "empty",
"Sec-Fetch-Mode": "cors",
"Sec-Fetch-Site": "same-origin",
"User-Agent": "Mozilla/5.0 (X11; CrOS x86_64 8172.45.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.64 Safari/537.36",
"sec-ch-ua-mobile": "?1",
"sec-ch-ua-platform": "Android"
}
response = requests.post(url_back, data=data, headers=headers)
print(response)
with open('dwnld.xlsx', 'wb') as f:
f.write(response.content)
However, the response returns a 406 error every time, even though I am using "/" in the accept line and a valid "User-Agent" that shouldn't be blocked. Any ideas as to why I am not able to get the POST to return a real response?
Upvotes: 1
Views: 292
Reputation: 33335
headers = {
"Accept-Encoding": "gzip, deflate, br",
"Accept-Language": "en-US,en;q=0.9",
...
You've told the website that you will only accept responses that use these specific encodings, and these specific languages.
But the website can't deliver those. So it returns 406, telling you that it can't meet your requirements.
Upvotes: 1
Reputation: 195438
Try to use json=
parameter instead of data=
. headers=
isn't necessary:
import requests
url = "https://registry.verra.org/uiapi/resource/resource/search?%24skip=0&count=true&%24format=excel&%24exportFileName=allprojects.xlsx"
payload = {
"program": "VCS",
"resourceStatuses": [
"VCS_EX_CRD_PRD_VER_REQUESTED",
"VCS_EX_CRD_PRD_REQUESTED",
"VCS_EX_REGISTERED",
"VCS_EX_REG_VER_APPR_REQUESTED",
"VCS_EX_REGISTRATION_REQUESTED",
"VCS_EX_REJ",
"VCS_EX_UNDER_DEVELOPMENT_CLD",
"VCS_EX_UNDER_DEVELOPMENT_OPN",
"VCS_EX_UNDER_VALIDATION_CLD",
"VCS_EX_UNDER_VALIDATION_OPN",
"VCS_EX_CRED_TRANS_FRM_OTHER_PROG",
"VCS_EX_WITHDRAWN",
],
}
with open("dwnld.xlsx", "wb") as f_out:
f_out.write(requests.post(url, json=payload).content)
Saves dwnld.xlsx
(screenshot from LibreOffice):
Upvotes: 1
Reputation: 16187
Data parameter meaning body data is json . So you have to send data as json format as header like json = data
import requests
import datetime as dt
url_back = 'https://registry.verra.org/uiapi/resource/resource/search?$skip=0&count=true&$format=excel&$exportFileName=allprojects.xlsx'
data = {"program":"VCS",
"resourceStatuses":["VCS_EX_CRD_PRD_VER_REQUESTED","VCS_EX_CRD_PRD_REQUESTED",
"VCS_EX_REGISTERED","VCS_EX_REG_VER_APPR_REQUESTED",
"VCS_EX_REGISTRATION_REQUESTED","VCS_EX_REJ",
"VCS_EX_UNDER_DEVELOPMENT_CLD","VCS_EX_UNDER_DEVELOPMENT_OPN",
"VCS_EX_UNDER_VALIDATION_CLD","VCS_EX_UNDER_VALIDATION_OPN",
"VCS_EX_CRED_TRANS_FRM_OTHER_PROG","VCS_EX_WITHDRAWN"]}
headers = {
"Accept": "*/*",
"Accept-Encoding": "gzip, deflate, br",
"Accept-Language": "en-US,en;q=0.9",
"Connection": "keep-alive",
"Content-Length": "369",
"Content-Type": "application/json",
"Cookie": "fpestid=9g1E7EZczSniadmveW8TL8DIBB_w-MDFov_fr0DQqgBD46kgkoVSzIdQHKP-hSxMbBr4tg; _ga=GA1.2.1884498504.1652482731; _gid=GA1.2.1741997157.1652482731; ASPSESSIONIDQERRTRAR=BFIILIADNEINGJAKKMCJGKKO",
"Host": "registry.verra.org",
"Origin": "https://registry.verra.org",
"Referer": "https://registry.verra.org/app/search/VCS/All%20Projects",
"Sec-Fetch-Dest": "empty",
"Sec-Fetch-Mode": "cors",
"Sec-Fetch-Site": "same-origin",
"User-Agent": "Mozilla/5.0 (X11; CrOS x86_64 8172.45.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.64 Safari/537.36",
"sec-ch-ua-mobile": "?1",
"sec-ch-ua-platform": "Android"
}
response = requests.post(url_back, json=data, headers=headers)
print(response)
# with open('dwnld.xlsx', 'wb') as f:
# f.write(response.content)
Upvotes: 1