is there any way to handle when href = '#' in scrapy?

Question

while working to scrape all the content from a website called timesjob,i was unable to access the next pages in the website as the href in the page nation class is showing as href = '#',here i could not access such hyperlinks.So i am unable to scrape the data from all the pages is there any way to access to solve the issue of getting the exact hyperlink if so please answer.Thank you. the link that i was trying to access was https://www.timesjobs.com/candidate/job-search.html?searchType=personalizedSearch&from=submit&txtKeywords=python&txtLocation=bangalore

ThePyGuy · Accepted Answer

It's worth noting that you can also play with result size. I had luck getting 1000 on one page here. This will probably help you out a lot. I tried 3400 and it fails you'll have to experiment to find out the limitations. Either way this should make this a much easier task for you.

https://www.timesjobs.com/candidate/job-search.html?from=submit&actualTxtKeywords=python&searchBy=0&rdoOperator=OR&searchType=personalizedSearch&txtLocation=bangalore&luceneResultSize=1000&postWeek=60&txtKeywords=python&pDate=I&sequence=2&startPage=1

This does not solve the problem of navigating to # but it does solve the problem of scraping all results. Also, note that startpage always stays at 1 and they use the sequence variable to paginate.

start_urls = ['https://www.timesjobs.com/candidate/job-search.html?from=submit&actualTxtKeywords=python&searchBy=0&rdoOperator=OR&searchType=personalizedSearch&txtLocation=bangalore&luceneResultSize=1000&postWeek=60&txtKeywords=python&pDate=I&sequence={}&startPage=1']

def start_requests(self):
    for i in range(1, 4):
        yield scrapy.Request(self.start_urls[0].format(i), callback=self.parse)

is there any way to handle when href = '#' in scrapy?

Answers (2)

Related Questions

is there any way to handle when href = &#39;#&#39; in scrapy?

Answers (2)

Related Questions

is there any way to handle when href = '#' in scrapy?