BoobaGump
BoobaGump

Reputation: 535

Beautiful Soup can't make the difference between CSS class

I'm trying to scrap Href from this page.

There a two types of product. Those highlighted and those who aren't. I want the latter. The CSS class associated with those product aren't the same. That's why i tried to used them.

I just tried here to output the li i was interested in for the moment.

from bs4 import BeautifulSoup
import urllib.request
from collections import *
from statistics import mean

list_url=[]

url = 'http://m.zooplus.co.uk/shop/pet_food/royal_canin_food/rc_size_dog'
response = urllib.request.urlopen(url)
html = response.read()
soup = BeautifulSoup(html,"html.parser")

product_list = soup.find_all("li", {"class":"list-item"})
for elem in product_list:
    print("////////////////BEGIN//////////")
    print(elem)
    print("///////////////END/////////////")

Output :

<li class="list-item highlighted">

That's the kind product i don't want.

And as well :

<li class="list-item ">

That's the product i want.

Has beautiful soup the same look on a <li class="list-item ">and a <li class="list-item highlighted"> ?

What did i miss ?

EDIT for Yogi :

from bs4 import BeautifulSoup
import urllib.request
from collections import *
from statistics import mean

list_url=[]

url = 'http://m.zooplus.co.uk/shop/pet_food/royal_canin_food/rc_size_dog'
response = urllib.request.urlopen(url)
html = response.read()
soup = BeautifulSoup(html,"html.parser")
product_list = soup.find_all("li", {"class":"list-item","id": lambda L: L !="special"})
for elem in product_list:
    print("////////////////BEGIN//////////")
    print(elem)
    print("///////////////END/////////////")

Upvotes: 0

Views: 46

Answers (1)

Yogiraj Banerji
Yogiraj Banerji

Reputation: 51

As I understand, this is what you mean: Highlighted/Not Highlighted

This works:

from bs4 import BeautifulSoup
import urllib.request
from collections import *
from statistics import mean
import time
import re



list_url=[]

url = 'http://m.zooplus.co.uk/shop/pet_food/royal_canin_food/rc_size_dog'
response = urllib.request.urlopen(url)
html = response.read()
soup = BeautifulSoup(html,"html.parser")
rows = soup.find_all('li',{'class':re.compile('list-item.*')})


for row in rows:
    cls=row.attrs.get("class")
    if not ("highlighted" in cls):
        print(row.text)

Upvotes: 1

Related Questions