Reputation: 23
Hi I am seeking some help after going back and forth trying to figure this out.
Summary:
I wish to open up a URL and subsequently open the get request which turns out to be a XML like HTML Content. I need to scrape that whole response.body
Loading on a browser does not get me any 503 Errors. But I am getting 503 errors on scrapy.
I have tried to use the selinum_scrapy in a combination with just the normal basic type of code.
I did get results on the first few tries. However, after that it did not produce any results and always has a 503 error.
I am using BrightData Webunlocker Proxy. Headers has been also added. So I am not sure what else i can do to make it load the first URL which is the main page where I receive this Get request. (I can go directly to this as well since I do have the parameters.
class MpvticketSpider(scrapy.Spider):
name = 'mpvticket'
urlin = "https://mpv.tickets.com/?agency=MLB_MPV&orgid=10&pid=9016700"
eventid = urlin.strip().split("pid=")[1]
urlout = "https://mpv.tickets.com/api/pvodc/v1/events/navmap/availability/?
pid="+eventid+"&agency=MLB_MPV&orgId=10&supportsVoucherRedemption=true"
start_urls = [urlin]
print("\n START URL BEING RUN: ", start_urls)
def parse(self, response):
url = "https://mpv.tickets.com/api/pvodc/v1/events/navmap/availability/?pid=9016700&agency=MLB_MPV&orgId=10&supportsVoucherRedemption=true"
print("\n FIRST URL BEING RUN: ",url)
username = 'lum-customer-XXXXX-zone-zone6ticket-route_err-pass_dyn'
password = 'XXXXX'
port = XXXX
session_id = random.random()
super_proxy_url = ('http://%s-country-us-session-%s:%[email protected]:%d' %
(username, session_id, password, port))
headers = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64; rv:48.0) Gecko/20100101 Firefox/48.0'}
yield SeleniumRequest(url=url, callback=self.parse_api,meta={'proxy': super_proxy_url},headers=headers)
def parse_api(self,response):
raw_data = response.text
print(raw_data)
#More data extraction code. Only need help with the top block with how to avoid the 503 error.
Upvotes: 0
Views: 133