Reputation: 783
I am trying to scrape formularylookup.com, a site with information on the market for pharmaceuticals.
It requires a login: username: - password: -
I need the information for the medicine called Rybelsus.
When I look into the Inspect-> Network -> XHR I suspect there could be an easy way to get the required data form this page:
https://formularylookup.com/Formulary/Coverage?ProductId=237171&ProductName=Rybelsus&ChannelId=1&DrugTypeId=3&StateId=all&Options=SummaryCoverages
I identified this site, which might give an idea of how to connect to formularylookup.com, but I am very inexperienced with connecting to API's.
Here's my code:
import requests
from bs4 import BeautifulSoup
url ="https://api.mmitnetwork.com/Formulary/v1/Products?Name=rybelsus"
params = {
"ProductId":"237171",
"productSearch":"Rybelsus"}
headers = {
"authorization":"Bearer H-oa4ULGls2Cpu8U6hX4myixRoFIPxfj",
"Access-Token":"H-oa4ULGls2Cpu8U6hX4myixRoFIPxfj",
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.87 Safari/537.36",
"X-Requested-With": "XMLHttpRequest",
"Host": "formularylookup.com",
"X-NewRelic-ID": "XAYCVFZSGwcGU1lXBAI="
}
res = requests.get(url ,params=params ,headers = headers)
soup = BeautifulSoup(res.content, "lxml")
print(soup.prettify())
Which gives me the following response:
<!DOCTYPE html>
<html>
<head>
<title>
The resource cannot be found.
</title>
<meta content="width=device-width" name="viewport"/>
<style>
body {font-family:"Verdana";font-weight:normal;font-size: .7em;color:black;}
p {font-family:"Verdana";font-weight:normal;color:black;margin-top: -5px}
b {font-family:"Verdana";font-weight:bold;color:black;margin-top: -5px}
H1 { font-family:"Verdana";font-weight:normal;font-size:18pt;color:red }
H2 { font-family:"Verdana";font-weight:normal;font-size:14pt;color:maroon }
pre {font-family:"Consolas","Lucida Console",Monospace;font-size:11pt;margin:0;padding:0.5em;line-height:14pt}
.marker {font-weight: bold; color: black;text-decoration: none;}
.version {color: gray;}
.error {margin-bottom: 10px;}
.expandable { text-decoration:underline; font-weight:bold; color:navy; cursor:hand; }
@media screen and (max-width: 639px) {
pre { width: 440px; overflow: auto; white-space: pre-wrap; word-wrap: break-word; }
}
@media screen and (max-width: 479px) {
pre { width: 280px; }
}
</style>
</head>
<body bgcolor="white">
<span>
<h1>
Server Error in '/' Application.
<hr color="silver" size="1" width="100%"/>
</h1>
<h2>
<i>
The resource cannot be found.
</i>
</h2>
</span>
<font face="Arial, Helvetica, Geneva, SunSans-Regular, sans-serif ">
<b>
Description:
</b>
HTTP 404. The resource you are looking for (or one of its dependencies) could have been removed, had its name changed, or is temporarily unavailable. Please review the following URL and make sure that it is spelled correctly.
<br/>
<br/>
<b>
Requested URL:
</b>
/Formulary/v1/Products
<br/>
<br/>
<hr color="silver" size="1" width="100%"/>
<b>
Version Information:
</b>
Microsoft .NET Framework Version:4.0.30319; ASP.NET Version:4.6.1590.0
</font>
</body>
</html>
<!--
[HttpException]: The controller for path '/Formulary/v1/Products' was not found or does not implement IController.
at System.Web.Mvc.DefaultControllerFactory.GetControllerInstance(RequestContext requestContext, Type controllerType)
at System.Web.Mvc.DefaultControllerFactory.CreateController(RequestContext requestContext, String controllerName)
at System.Web.Mvc.MvcHandler.ProcessRequestInit(HttpContextBase httpContext, IController& controller, IControllerFactory& factory)
at System.Web.Mvc.MvcHandler.BeginProcessRequest(HttpContextBase httpContext, AsyncCallback callback, Object state)
at System.Web.HttpApplication.CallHandlerExecutionStep.System.Web.HttpApplication.IExecutionStep.Execute()
at System.Web.HttpApplication.ExecuteStep(IExecutionStep step, Boolean& completedSynchronously)
-->
<!--
This error page might contain sensitive information because ASP.NET is configured to show verbose error messages using <customErrors mode="Off"/>. Consider using <customErrors mode="On"/> or <customErrors mode="RemoteOnly"/> in production environments.-->
Update: I get an 404 error. Not sure why.
Upvotes: 1
Views: 75
Reputation: 3618
Below code will help you,
import requests
headers = {
'Accept': '*/*',
'X-Requested-With': 'XMLHttpRequest',
'Access-Token': '7Lq-KkDx2fCO_3kG90pLEpBS9Ssh62IQ',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.87 Safari/537.36',
'Is-Session-Expired': 'false',
'Referer': 'https://formularylookup.com/',
}
response = requests.get('https://formularylookup.com/Formulary/Coverage?ProductId=237171&ProductName=Rybelsus&ChannelId=1&DrugTypeId=3&StateId=AL&Options=SummaryCoverages', headers=headers)
print(response.json())
Note:
'Is-Session-Expired': 'false'
is very important in the header otherwise you'll get 404 error.
See it in action here
Upvotes: 2