Python & Beautifulsoup 4 - Unable to filter classes?

Question

I'm trying to scrape shoe sizes from this URL: http://www.jimmyjazz.com/mens/footwear/jordan-retro-13--atmosphere-grey-/414571-016?color=Grey

What I'm trying to do is get only the sizes that are available, e.g. only those that aren't greyed out.

The sizes are all wrapped in a elements. The available sizes are of box class, and the unavailable ones are of box piunavailable class.

I have tried using a lambda function, ifs and CSS selectors - none seem to work. My guess it's because of the way my code is structured.

The way it's structured is as follows:

The if attempt

size = soup2.find('div', attrs={'class': 'psizeoptioncontainer'})
getsize = str([e.get_text() for e in size.findAll('a', attrs={'class': 'box'}) if 'piunavailable' not in e.attrs['class']])

The lambda attempt

size = soup2.find('div', attrs={'class': 'psizeoptioncontainer'})
getsize = str([e.get_text() for e in size.findAll(lambda tag: tag.name == 'a' and tag.get('class') == ['box piunavailable'])])

The CSS selector attempt

size = soup2.find('div', attrs={'class': 'psizeoptioncontainer'})
getsize = str([e.get_text() for e in size.findAll('a[class="box"]'))

So, for the URL provided, I am expecting the results to be a string (converted from list) that is all available sizes - at the time of writing this question, it should be - '8', '8.5', '9', '9.5', '10', '10.5', '11', '11.5', '13'

Instead, I'm getting all sizes, '7.5', '8', '8.5', '9', '9.5', '10', '10.5', '11', '11.5', '12', '13'

Anyone have an idea how to make it work (or know an elegant solution to my issue)? Thank you in advance!

QHarr · Accepted Answer

You want a css :not pseudo class selector to exclude the other class. Using bs4 4.7.1.

sizes = [item.text for item in soup.select('.box:not(.piunavailable)')]

In full:

import requests
from bs4 import BeautifulSoup

r = requests.get('http://www.jimmyjazz.com/mens/footwear/jordan-retro-13--atmosphere-grey-/414571-016?color=Grey')  
soup = BeautifulSoup(r.content,'lxml')  
sizes = [item.text for item in soup.select('.box:not(.piunavailable)')]
print(sizes)

Python & Beautifulsoup 4 - Unable to filter classes?

Answers (2)

Related Questions

Python &amp; Beautifulsoup 4 - Unable to filter classes?

Answers (2)

Related Questions

Python & Beautifulsoup 4 - Unable to filter classes?