BeautifulSoup printing the same results twice

Question

URL = "https://bitcointalk.org/index.php?board=1.0"
page = requests.get(URL)
soup = BeautifulSoup(page.content, 'html.parser')
numberOfPages = 0
currentPage = 0
counter = 1

for blabla in soup.find_all("a" , attrs={"class" : "navPages"})[-2]:  
    numberOfPages = int(blabla.string)
    print("Pages count: " + str(numberOfPages))
  

for i in range(0,numberOfPages):
    URLX = "https://bitcointalk.org/index.php?board=1."+ str(currentPage)
    print(URLX)
    print("------------------------------------------------- Page count is: " + str(counter))
    counter += 1
    currentPage += 20
    page1 = requests.get(URLX)
    soup1 = BeautifulSoup(page1.content, 'html.parser')   
    time.sleep(1.0)
    for random in soup1.find_all("span", attrs={"id": re.compile("^msg")}):
        for b in random.find_all('a', href=True):
            print (b.string)

I'm trying to go through all the pages on the "Bitcoin discussion board" and print the topic's name's from each page. It's working but for some reason, it keeps printing the topic's name twice...while going through different pages. For example:

URL (firstpage): https://bitcointalk.org/index.php?board=1.0

would print its actual content:

ABC123

anotherTopic

Then... even when the URL changes to the second page, it would still print the same topics.

And then the same thing happens for all the other pages. Each page gets printed twice (even though the URL is changing).

Any thoughts? This is my first experience with Python and BeautifulSoup.

Krishna Chaurasia · Accepted Answer

The links for the different pages are as follows i.e. they are in increments of .40:

https://bitcointalk.org/index.php?board=1.0
https://bitcointalk.org/index.php?board=1.40
https://bitcointalk.org/index.php?board=1.80
https://bitcointalk.org/index.php?board=1.120

So, it should be currentPage += 40 instead of current currentPage += 20.

BeautifulSoup printing the same results twice

Answers (1)

Related Questions