Suren Gunaseelan
Suren Gunaseelan

Reputation: 23

How do I load a XML Page thru Scrapy without getting 502 (Bad Gateway Error) with a Proxy

Hi I am seeking some help after going back and forth trying to figure this out.

Summary:

I wish to open up a URL and subsequently open the get request which turns out to be a XML like HTML Content. I need to scrape that whole response.body

Sample: https://mpv.tickets.com/api/pvodc/v1/events/navmap/availability/?pid=9016700&agency=MLB_MPV&orgId=10&supportsVoucherRedemption=true

Loading on a browser does not get me any 503 Errors. But I am getting 503 errors on scrapy.

I have tried to use the selinum_scrapy in a combination with just the normal basic type of code.

I did get results on the first few tries. However, after that it did not produce any results and always has a 503 error.

I am using BrightData Webunlocker Proxy. Headers has been also added. So I am not sure what else i can do to make it load the first URL which is the main page where I receive this Get request. (I can go directly to this as well since I do have the parameters.

class MpvticketSpider(scrapy.Spider):
   name = 'mpvticket'


   urlin = "https://mpv.tickets.com/?agency=MLB_MPV&orgid=10&pid=9016700"
   eventid = urlin.strip().split("pid=")[1]
   urlout = "https://mpv.tickets.com/api/pvodc/v1/events/navmap/availability/? 
   pid="+eventid+"&agency=MLB_MPV&orgId=10&supportsVoucherRedemption=true"

start_urls = [urlin]
 print("\n START URL BEING RUN: ", start_urls)
 


def parse(self, response):
    url = "https://mpv.tickets.com/api/pvodc/v1/events/navmap/availability/?pid=9016700&agency=MLB_MPV&orgId=10&supportsVoucherRedemption=true"
    print("\n FIRST  URL BEING RUN: ",url) 
    username = 'lum-customer-XXXXX-zone-zone6ticket-route_err-pass_dyn'
    password = 'XXXXX'
    port = XXXX
    session_id = random.random()
    super_proxy_url = ('http://%s-country-us-session-%s:%[email protected]:%d' %
        (username, session_id, password, port))
    headers = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64; rv:48.0) Gecko/20100101 Firefox/48.0'}
    yield SeleniumRequest(url=url, callback=self.parse_api,meta={'proxy': super_proxy_url},headers=headers)

def parse_api(self,response):
    
    raw_data = response.text
    
    print(raw_data)
#More data extraction code. Only need help with the top block with how to avoid the 503 error. 

Upvotes: 0

Views: 133

Answers (0)

Related Questions