Reputation: 53
hey i want to get the last href value in the navigation bar which is this part "/cat/cremes-solaires-indice-30" but i got error that the string index out of range thank you for helping me :)
from bs4 import BeautifulSoup as soup
PAGE_URL = https://www.e.leclerc/fp/sunissime-bb-fluide-protecteur-anti-age-global-spf30-40ml-3508240006457
def product():
response = requests.get(PAGE_URL, headers=HEADERS)
soupe = soup(response.content, 'html5lib')
sauce = soupe.find_all("div", class_="product-breadcrumb")
for x in sauce:
a = x.select('[href]')
print(type(a))
for i in a:
u = i.get("href")
print(u[-2])```
Upvotes: 2
Views: 61
Reputation: 49
The object 'u' is not a dictionary or any other data set storing object , So every time the loop repeats the variable u is being over written . So you can either declare a list or can make the variable global .
Method 1:
from bs4 import BeautifulSoup as soup
PAGE_URL = https://www.e.leclerc/fp/sunissime-bb-fluide-protecteur-anti-age-global-spf30-40ml-3508240006457
def product():
response = requests.get(PAGE_URL, headers=HEADERS)
soupe = soup(response.content, 'html5lib')
sauce = soupe.find_all("div", class_="product-breadcrumb")
for x in sauce:
a = x.select('[href]')
print(type(a))
for i in a:
global u
u = i.get("href")
print(u)
Method 2:
from bs4 import BeautifulSoup as soup
PAGE_URL = https://www.e.leclerc/fp/sunissime-bb-fluide-protecteur-anti-age-global-spf30-40ml-3508240006457
def product():
response = requests.get(PAGE_URL, headers=HEADERS)
soupe = soup(response.content, 'html5lib')
sauce = soupe.find_all("div", class_="product-breadcrumb")
for x in sauce:
a = x.select('[href]')
u = []
print(type(a))
for i in a:
u.append(i.get("href"))
print(u[-1])
P.S : Your code is missing many thing else
Upvotes: 1
Reputation: 3400
You can find main p
tag and from it use find_all
method on a
tag where it returns list of tag from it extract href
from bs4 import BeautifulSoup
import requests
headers={"user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.107 Safari/537.36"}
req = requests.get('https://www.e.leclerc/fp/sunissime-bb-fluide-protecteur-anti-age-global-spf30-40ml-3508240006457',headers=headers)
soup = BeautifulSoup(req.text, 'html.parser')
soup.find("p",class_="wrapper d-inline-block").find_all("a")[-1]['href']
Output:
'/cat/cremes-solaires-indice-30'
Upvotes: 2