Reputation: 5578
Hi I'm trying to learn how to scrap elements with python, and I was trying to get the title of a web page (local.ch) but my code is not working and I don't know why.
here the python code:
import requests
from bs4 import BeautifulSoup
def spider(max_pages):
page = 2
while page < max_pages:
url = 'http://yellow.local.ch/fr/q/Morges/Bar.html?page=' + str(page)
source_code = requests.get(url)
plain_text = source_code.text
soup = BeautifulSoup(plain_text)
for link in soup.findAll('a', {'class':'details-entry-title-link'}):
title = link.string
print(title)
page += 1
spider(3)
I'm pretty sure that the code is correct I don't have any error on pycharm, why is it not working?
Upvotes: 0
Views: 1462
Reputation: 1024
You have a major bug in your code:
page = 1
while page < max_pages
....
spider(1)
The condition is never met, and the rest of your code doesn't get executed! Some other bugs are encoding error and unspecified parser warnings:
import requests
from bs4 import BeautifulSoup
def spider(max_pages):
page = 1
while page <= max_pages:
url = 'http://yellow.local.ch/fr/q/Morges/Bar.html?page=' + str(page)
source_code = requests.get(url)
plain_text = source_code.text.encode("utf-8")
soup = BeautifulSoup(plain_text, 'html.parser')
for link in soup.findAll('a', {'class':'details-entry-title-link'}):
title = link.string
print(title.encode("utf-8"))
page += 1
spider(1)
Note the encoding "utf-8"
part - this encoding will result in binary output, as you can see from the b
prefix. Without this step, the print()
function will throw an error. The same change is made on plain_textplain_text = source_code.text.encode("utf-8")
line.
Another bug is the wrong indentation of page += 1
line. It should be inside the while loop.
Upvotes: 2
Reputation: 763
To the function spider
you are passing 1 as the max_pages
argument. However, your while loop will only execute if page < max_pages
. 1 < 1 is not true.
Upvotes: 1
Reputation: 651
Probably because, you intended to initialize the page variable from 0 not 1. Presently, it never enters the loop. Because, both page and max page do have same value which is 1.
Upvotes: 1