Reputation: 47
I have got the code to scrape the first page, but the url changes from:
https://www.expansion.com/empresas-de/ganaderia/granjas-en-general/index.html --> https://www.expansion.com/empresas-de/ganaderia/granjas-en-general/2.html
import requests<br>
from bs4 import BeautifulSoup<br>
url = "https://www.expansion.com/empresas-de/ganaderia/granjas-en-general/index.html"<br>
page = requests.get(url)<br>
soup = BeautifulSoup(page.content, "html.parser")<br>
lists = soup.select("div#simulacion_tabla ul")<br>
for lis in lists:<br>
title = lis.find('li', class_="col1").text<br>
location = lis.find('li', class_="col2").text<br>
province = lis.find('li', class_="col3").text<br>
info = [title, location, province]<br>
How can I create a loop that would run from page 2 - page 65? Many thanks!
Upvotes: 1
Views: 3100
Reputation: 301
First of all please be sure to format your code in the correct way so everyone can read it. Here you can find more.
This one could be a potential solution. Far from being optimized code, but you can take some inspiration.
import requests
from bs4 import BeautifulSoup
def scrape_page(url):
""" Scrape the give url and return the bs4 ResultSet """
page = requests.get(url)
soup = BeautifulSoup(page.content, "html.parser")
table = soup.select("div#simulacion_tabla ul")
print(type(table))
return table
def extract_rows(table):
""" Extract rows """
rows = []
for row in table:
title = row.find('li', class_="col1").text
location = row.find('li', class_="col2").text
province = row.find('li', class_="col3").text
rows.append([title, location, province])
return rows
big_table = []
index = scrape_page("https://www.expansion.com/empresas-de/ganaderia/granjas-en-general/index.html")
for row in extract_rows(index):
big_table.append(row)
for x in range(2, 66):
index = scrape_page("https://www.expansion.com/empresas-de/ganaderia/granjas-en-general/" + str(x) + ".html")
for row in extract_rows(index):
big_table.append(row)
print(big_table)
Upvotes: 1
Reputation: 16189
Here is the working solution:
import requests
from bs4 import BeautifulSoup
for page in range(1,65):
url = "https://www.expansion.com/empresas-de/ganaderia/granjas-en-general/{page}.html".format(page =page)
#print(url)
page = requests.get(url)
soup = BeautifulSoup(page.content, "html.parser")
lists = soup.select("div#simulacion_tabla ul")
for lis in lists:
title = lis.find('li', class_="col1").text
location = lis.find('li', class_="col2").text
province = lis.find('li', class_="col3").text
info = [title, location, province]
print(info)
Upvotes: 1