Can't get rid of a loop when certain conditions are met

Question

I've created a script in python to get the first 400 links of search results from bing. It's not sure that there will always be at least 400 results. In this case the number of results is around 300. There are 10 results in it's landing page. However, the rest of the results can be found traversing next pages. The problem is when there is no more next page link in there, the webpage displays the last results over and over again.

Search keyword is michael jackson and ths is a full-fledged link

How can I get rid of the loop when there are no more new results or the results are less than 400?`

I've tried with:

import time
import requests
from bs4 import BeautifulSoup

link = "https://www.bing.com/search?"

params = {'q': 'michael jackson','first': ''}

def get_bing_results(url):
    q = 1
    while q<=400:
        params['first'] = q
        res = requests.get(url,params=params,headers={
            "User-Agent":"Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.97 Safari/537.36"
            })
        soup = BeautifulSoup(res.text,"lxml")
        for link in soup.select("#b_results h2 > a"):
            print(link.get("href"))

        time.sleep(2)
        q+=10

if __name__ == '__main__':
    get_bing_results(link)

Sam · Accepted Answer

As I mentioned in the comments, couldn't you do something like this:

import time
import requests
from bs4 import BeautifulSoup

link = "https://www.bing.com/search?"

params = {'q': 'michael jackson','first': ''}

def get_bing_results(url):
    q = 1
    prev_soup = str()
    while q <= 400:
        params['first'] = q
        res = requests.get(url,params=params,headers={
            "User-Agent":"Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.97 Safari/537.36"
            })
        soup = BeautifulSoup(res.text,"lxml")
        if str(soup) != prev_soup:
            for link in soup.select("#b_results h2 > a"):
                print(link.get("href"))
            prev_soup = str(soup)
        else:
            break
        time.sleep(2)
        q+=10

if __name__ == '__main__':
    get_bing_results(link)

Can't get rid of a loop when certain conditions are met

Answers (1)

Related Questions

Can&#39;t get rid of a loop when certain conditions are met

Answers (1)

Related Questions

Can't get rid of a loop when certain conditions are met