NK20
NK20

Reputation: 75

BeautifulSoup - Scrape multiple pages

I want to scrape the name of the members from each page and move on to the next pages and do the same. My code is working for only one page. I'm very new to this, Any advice would be appreciated. Thank you.

    import requests
    from bs4 import BeautifulSoup

    r = requests.get("https://www.bodia.com/spa-members/page/1")
    soup = BeautifulSoup(r.text,"html.parser")
    lights = soup.findAll("span",{"class":"light"})

    lights_list = []
    for l in lights[0:]:
        result = l.text.strip()
        lights_list.append(result)

    print (lights_list)

I tried this and it only gives me the members of the page 3.

    for i in range (1,4): #to scrape names of page 1 to 3
    r = requests.get("https://www.bodia.com/spa-members/page/"+ format(i))
soup = BeautifulSoup(r.text,"html.parser")
lights = soup.findAll("span",{"class":"light"})

lights_list = []
for l in lights[0:]:
    result = l.text.strip()
    lights_list.append(result)

print (lights_list)

Then I tried this :

i = 1
while i<5:
    r = requests.get("https://www.bodia.com/spa-members/page/"+str(i))
i+=1

soup = BeautifulSoup(r.text,"html.parser")
lights = soup.findAll("span",{"class":"light"})

lights_list = []
for l in lights[0:]:
    result = l.text.strip()
lights_list.append(result)

print (lights_list)

It gives me the name of 4 members, but I don't know from which page

['Seng Putheary (Nana)']
['Marco Julia']
['Simon']
['Ms Anne Guerineau']

Upvotes: 3

Views: 483

Answers (1)

Matthew Gaiser
Matthew Gaiser

Reputation: 4763

Just two changes needed to be made to get it to scrape everything.

  1. r = requests.get("https://www.bodia.com/spa-members/page/"+ format(i)) needs to be changed to r = requests.get("https://www.bodia.com/spa-members/page/{}".format(i)). Your use of format was incorrect.

  2. You were not looping over all the code, so the result was that it only printed out one set of names and then had no way to return to the start of the loop. Indenting everything under the for loop fixed that.

import requests
from bs4 import BeautifulSoup

for i in range (1,4): #to scrape names of page 1 to 3
    r = requests.get("https://www.bodia.com/spa-members/page/{}".format(i))
    soup = BeautifulSoup(r.text,"html.parser")
    lights = soup.findAll("span",{"class":"light"})
    lights_list = []
    for l in lights[0:]:
        result = l.text.strip()
        lights_list.append(result)

    print(lights_list)

The above code was spitting out a list of names every 3 seconds for the pages it scraped.

Upvotes: 2

Related Questions