Reputation: 85
I am confused by the class attribute of list items, inside an unordered list.
I mention I am writing a python program to crawl from a website, which targets the li elements inside an ul list. There are 45 li elements inside the ul, 17 of which have no "class" attribute assigned to them. Here is a portion of the ul.
My customized target selector is "ul.vacanciesList li" and I only get the 17 ones that don't have the "class" keyword.
My question is, what is that "class" keyword that appears in the markup for the li elements, and how to target them (the li-s) in order to get all 45 of them, not only the ones without class.
Customized code:
'title' => ['selector' => 'h3'],
'containerSelector' => 'ul.vacanciesList li',
'detailSelector' => '#bigbox',
'location' => ['selector' => 'div.place'],
Thank you.
Upvotes: 1
Views: 294
Reputation: 2540
An empty attribute (attribute without value) is valid. <tag class="">
or <tag class>
just means the element belongs to the class ""
. Read this answer for more details.
To find the list items:
soup = bs4.BeautifulSoup(page, 'lxml')
litems = soup.findAll('li', {'class' : ''})
Or, you can find the ul
tag, which does have a class
attribute value assigned to it and get all the listitems
from there.
soup = bs4.BeautifulSoup(page, 'lxml')
# get the unordered list of interest
unordered_list = soup.finqd('ul', {'class' : 'article vacanciesList'})
# extract all the list items from them
list_items = unordered_list.findAll('li')
print(list_items)
Upvotes: 1