Aparna Prasad
Aparna Prasad

Reputation: 91

how do I extract data from linked pages in websites using python

I have been trying to scrape data from webpages for data analytics project and I managed successfully to get the data from a single page.

import requests
from bs4 import BeautifulSoup
import concurrent.futures
from urllib.parse import urlencode
from scraper_api import ScraperAPIClient


    client = ScraperAPIClient('key')
    results = client.get(url = "https://www.essex.ac.uk/course-search?query=&f.Level%7CcourseLevel=Undergraduate").text
    
    print(results)

For an example from the site "https://www.essex.ac.uk/course-search?query=&f.Level%7CcourseLevel=Undergraduate" I need to navigate inside each courses and get a single data called duration from that page.

Upvotes: 1

Views: 106

Answers (2)

Sudipto
Sudipto

Reputation: 299

import requests
from bs4 import BeautifulSoup
import concurrent.futures
from urllib.parse import urlencode
from scraper_api import ScraperAPIClient

client = ScraperAPIClient('key')
total_pages = 12
for page_no in range(total_pages):
    # you control this page_no variable.
    # go to the website and see how the api go to the next page
    # it depends on the 'start_rank' at the end of the URL
    # for example start_rank=10, start_rank=20 will get you one page after another
    rank = page_no * 10
    results = client.get(url="https://www.essex.ac.uk/course-search?query=&f.Level%7CcourseLevel=Undergraduate&start_rank={0}".format(rank)).text
    print(results)

Upvotes: 1

AmineBTG
AmineBTG

Reputation: 697

Try the below :

client = ScraperAPIClient('key')
results = []
for i in range(10):
   results.append(client.get(url = f"https://www.essex.ac.uk/course-search?query=&f.Level%7CcourseLevel=Undergraduate&start_rank={i}1").text)
    
print(results)

loop through the 10 results page and put each text respone in the results list

Upvotes: 1

Related Questions