Reputation: 535
I'm trying to scrap Href from this page.
There a two types of product. Those highlighted and those who aren't. I want the latter. The CSS class associated with those product aren't the same. That's why i tried to used them.
I just tried here to output the li
i was interested in for the moment.
from bs4 import BeautifulSoup
import urllib.request
from collections import *
from statistics import mean
list_url=[]
url = 'http://m.zooplus.co.uk/shop/pet_food/royal_canin_food/rc_size_dog'
response = urllib.request.urlopen(url)
html = response.read()
soup = BeautifulSoup(html,"html.parser")
product_list = soup.find_all("li", {"class":"list-item"})
for elem in product_list:
print("////////////////BEGIN//////////")
print(elem)
print("///////////////END/////////////")
Output :
<li class="list-item highlighted">
That's the kind product i don't want.
And as well :
<li class="list-item ">
That's the product i want.
Has beautiful soup the same look on a <li class="list-item ">
and a <li class="list-item highlighted">
?
What did i miss ?
EDIT for Yogi :
from bs4 import BeautifulSoup
import urllib.request
from collections import *
from statistics import mean
list_url=[]
url = 'http://m.zooplus.co.uk/shop/pet_food/royal_canin_food/rc_size_dog'
response = urllib.request.urlopen(url)
html = response.read()
soup = BeautifulSoup(html,"html.parser")
product_list = soup.find_all("li", {"class":"list-item","id": lambda L: L !="special"})
for elem in product_list:
print("////////////////BEGIN//////////")
print(elem)
print("///////////////END/////////////")
Upvotes: 0
Views: 46
Reputation: 51
As I understand, this is what you mean:
This works:
from bs4 import BeautifulSoup
import urllib.request
from collections import *
from statistics import mean
import time
import re
list_url=[]
url = 'http://m.zooplus.co.uk/shop/pet_food/royal_canin_food/rc_size_dog'
response = urllib.request.urlopen(url)
html = response.read()
soup = BeautifulSoup(html,"html.parser")
rows = soup.find_all('li',{'class':re.compile('list-item.*')})
for row in rows:
cls=row.attrs.get("class")
if not ("highlighted" in cls):
print(row.text)
Upvotes: 1