Hrvoje
Hrvoje

Reputation: 15152

Beautifoul soup - getting all li elements from ul where only first li has specific class name

I have unordered list like this in HTML:

<ul> 
<li class="label">Equipement</li>
<li>Aluminum tyres</li>
<li>4x4</li>
<li>3. stop lights</li>
<li>Bluetooth</li>
</ul>

Only first li element in the ul list contains title of the list, other elements contain list of features that needs to be extracted in plain text. I know how to locate that first li but I don't know how to select all other elements.

Consider that this ul doesn't have class and its in the HTML document with a lot of other ul elements. I can locate that ul through li with:

 (li.previousSibling).get_text() 

but cannot extract all elements with get_text() , I'm getting:

AttributeError: 'NavigableString' object has no attribute 'get_text'

Also I need to extract all li except first one which holds title. I have several ul on page like this and they are all variable in lenght (have more or less li elements).

EDIT

My code so far. I'm finding elements with:

 carBasics = soup.select('li.label')

    for li in carBasics:
         if li.contents[0]=="Equipement":
            carAdditionalEquipement = (li.previousSibling).find_all('li')

AttributeError: 'NavigableString' object has no attribute 'get_text'

Upvotes: 0

Views: 1348

Answers (4)

Hrvoje
Hrvoje

Reputation: 15152

Idea is to omit first li. No one gave answer to that so this is how I did it in the end:

for item in soup.select("ul li.labela"):
   if item.text=="Equipement":
       carAdditionalEquipement = li.parent.text[len(li.contents[0])+1:].strip().splitlines()  

From that I'm getting nice list without first line which is taken out with [len(li.contents[0])+1:].

Basically I'm chopping off lenght of firsts element from string list and splitting it than since there is newline char on the end of each list

Upvotes: 0

QHarr
QHarr

Reputation: 84455

Use a css general sibling combinator and with bs4 4.7.1+ you can use :contains to specify the label text as well if known

from bs4 import BeautifulSoup as bs

html = '''
<ul> 
<li class="label">Equipement</li>
<li>Aluminum tyres</li>
<li>4x4</li>
<li>3. stop lights</li>
<li>Bluetooth</li>
</ul>
'''
soup = bs(html, 'lxml')
print([li.text for li in soup.select('.label:contains("Equipement") ~ li')])

Upvotes: 1

KunduK
KunduK

Reputation: 33384

Use find_next_siblings()

from bs4 import BeautifulSoup

html='''<ul>
<li class="label">Equipement</li>
<li>Aluminum tyres</li>
<li>4x4</li>
<li>3. stop lights</li>
<li>Bluetooth</li>
</ul>
<ul>
<li class="label">Equipement</li>
<li>Aluminum tyres</li>
<li>4x4</li>
<li>3. stop lights</li>
<li>Bluetooth</li>
</ul>'''
soup = BeautifulSoup(html, 'lxml')
for item in soup.select("ul li.label"):
    if item.text=="Equipement":
        siblings=[s.text for s in item.find_next_siblings('li')]
        print(siblings)

Edited the answer:

import requests
from bs4 import BeautifulSoup
html = requests.get('https://www.index.hr/oglasi/bmw-serija-5-3-0-xd/oid/1971034')

soup = BeautifulSoup(html.content, 'html.parser')

for item in soup.select("ul li.labela"):
   if item.text=="Dodatna oprema vozila":
      siblings=[s.text for s in item.find_next_siblings('li')]
      print(siblings)

Upvotes: 1

from bs4 import BeautifulSoup
import requests

html = requests.get(
    'yoururl')

soup = BeautifulSoup(html.content, 'html.parser')

for li in soup.select('ul li.labela'):
  if li.contents[0]=="Equipement":
    print(li.parent.text)

Upvotes: 1

Related Questions