Reputation: 93
Hello Everyone, I am beginner and I try to use IF ELSE function with url link in web scraping.
I want to select all the pages from de department 64 to 66.
My url is : http://www.pour-les-personnes-agees.gouv.fr/annuaire-accueil-de-jour/{}/0 (with {} = 64 or 65 or 66).
My loop works and select all my pages for 64. But when I am inside the 65 I saw I have only one page so my code line last_page = soup.find('ul', class_='pagination').find('li', class_='next').a['href'].split('=')[1]
cannot work. here my code :
import requests
from bs4 import BeautifulSoup
url_list = ['http://www.pour-les-personnes-agees.gouv.fr/annuaire-accueil-de-jour/{}/0']
for link in url_list:
r=requests.get(link)
soup = BeautifulSoup(r.content, "html.parser")
page_Url_test=[link.format(i) for i in range(64, 66)]
for depart_page in page_Url_test:
depart_page1=str(depart_page)+"?page={}"
r=requests.get(depart_page1)
soup = BeautifulSoup(r.content, "html.parser")
last_page = soup.find('ul', class_='pagination').find('li', class_='next').a['href'].split('=')[1]
dept_page_Url=[depart_page1.format(i) for i in range(0, int(last_page)+1)]
print(dept_page_Url)
I tried to incorporate an IF ELSE like this:
for depart_page in page_Url_test:
depart_page1=str(depart_page)+"?page={}"
r=requests.get(depart_page1)
soup = BeautifulSoup(r.content, "html.parser")
if len(depart_page1) == 0 :
dept_page_Url=depart_page1
else:
last_page = soup.find('ul', class_='pagination').find('li', class_='next').a['href'].split('=')[1]
dept_page_Url=[depart_page1.format(i) for i in range(0, int(last_page)+1)]
print(dept_page_Url)
But It doesn't work. How can I say to my code: If I have just one page select just the first one else do my next step? Any clue ? I don't have enough knowledge to find alone... Thank you a lot
Upvotes: 0
Views: 56
Reputation: 22440
As sir t.m.adam has already pointed out, you can try like the below approach. I also have trimmed your code to make it concise.
import requests
from bs4 import BeautifulSoup
url_list = 'http://www.pour-les-personnes-agees.gouv.fr/annuaire-accueil-de-jour/{}/0'
for link in [url_list.format(page) for page in range(64,67)]:
res = requests.get(link)
soup = BeautifulSoup(res.text,"lxml")
depart_page = str(link) + "?page={}"
if soup.find('ul', class_='pagination'):
last_page = soup.find('ul', class_='pagination').find('li', class_='next').a['href'].split('=')[1]
dept_page_Url = [depart_page.format(i) for i in range(0, int(last_page)+1)]
print(dept_page_Url)
Additional approach when in need:
if soup.find('ul', class_='pagination'):
last_page = soup.find('ul', class_='pagination').find('li', class_='next').a['href'].split('=')[1]
dept_page_Url = [depart_page.format(i) for i in range(0, int(last_page)+1)]
print(dept_page_Url)
else:
print(link)
Result:
['http://www.pour-les-personnes-agees.gouv.fr/annuaire-accueil-de-jour/64/0?page=0', 'http://www.pour-les-personnes-agees.gouv.fr/annuaire-accueil-de-jour/64/0?page=1', 'http://www.pour-les-personnes-agees.gouv.fr/annuaire-accueil-de-jour/64/0?page=2']
['http://www.pour-les-personnes-agees.gouv.fr/annuaire-accueil-de-jour/66/0?page=0', 'http://www.pour-les-personnes-agees.gouv.fr/annuaire-accueil-de-jour/66/0?page=1', 'http://www.pour-les-personnes-agees.gouv.fr/annuaire-accueil-de-jour/66/0?page=2']
Upvotes: 1