Why does my program only output the last page of a multiple page scraping operation?

Question

I am trying to scrape multiple pages using beautifulsoup concept, but am getting only the last page results as output, please suggest the right way. Below is my code.

# For every page 

for page in range(0,8):
    # Make a get request
    response = get('http://nationalacademyhr.org/fellowsdirectory?page=0%2C{}' + format(page))
    # Pause the loop
    sleep(randint(8,15))
     # Monitor the requests
    requests += 1
    elapsed_time = time() - start_time
    print('Request:{}; Frequency: {} requests/s'.format(requests, requests/elapsed_time))
    clear_output(wait = True)

    html_soup = BeautifulSoup(response.text, 'html.parser')
    all_table_info = html_soup.find('table', class_ = "views-table cols-4")


    for name in all_table_info.find_all('div', 
           class_="views-field views-field-view"):
    names.append(name.text.replace("
", " ")if name.text else None)


    for organization in all_table_info.find_all('td', 
           class_="views-field views-field-field-employer"):
    orgs.append(organization.text.strip() if organization.text else None)


    for year in all_table_info.find_all('td', 
           class_ = "views-field views-field-view-2"):
    Years.append(year.text.strip() if year.text else None)


    df = pd.DataFrame({'Name' : names, 'Org' : orgs, 'year' : Years })

    print (df)

hancar · Accepted Answer

There is a typing error: a plus instead of a dot. You need 'http://nati...ge=0%2C{}'.format(page), but you wrote 'http://nati...ge=0%2C{}' + format(page)

URLs having braces before the page number end up at the same page.

EDIT:

If I was not clear, you need just change the line response = get('http://nationalacademyhr.org/fellowsdirectory?page=0%2C{}' + format(page)) to response = get('http://nationalacademyhr.org/fellowsdirectory?page=0%2C{}'.format(page))

In the first case the resulting URL contains also the substring '{}', which causes the problem.

Why does my program only output the last page of a multiple page scraping operation?

Answers (2)

Related Questions