Stefan Badertscher
Stefan Badertscher

Reputation: 341

Extract <li> tag from BeautifulSoup resultSet

I want to extract all the <li>tags from the html page. The needed content can be retrieved with result = soup.find('div', {'class':'column column_620 column_content'}). This returns the first class column_620. After that I get its siblings and want to extract the <li> tags. But that result does not have the method findAll(). What can I do to extract the desired <li> tag entry?

import re
import time
from datetime import datetime
import platform
import pandas as pd
from numpy import nan
from itertools import chain

from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

timestampStart = datetime.now().strftime("%Y-%m-%d %H:%M:%S")

sdCel = 'http://www.linguista.ch/sprachschule/san-diego-cel/'

#browser = webdriver.PhantomJS() # headless
browser = webdriver.Chrome() # run it with chrome browser appearing
browser.get(sdCel)    
sellingPoints = soup.find('div', {'class':'column column_620 column_content'})
points_ul = sellingPoints.find_next_siblings()
#points_ul = sellingPoints.parent.find_next_sibling()

for item in points_ul.findAll('li'): #this gives error
    if isinstance(item, Tag):
        print item.text

This gives the following error: AttributeError: 'ResultSet' object has no attribute 'findAll'

This is the part of points_ul which I have to retrieve:

<div class="column column_620 column_content">\n <h3>Weshalb wir College of English Language f\xfcr einen Sprachaufenthalt empfehlen:</h3>\n <p></p><ul><li>Beste Lage im Stadtzentrum von San Diego</li><li>Sprachschule mit famil\xe4rer Atmosph\xe4re</li> <li>Von der Terrasse aus geniessen Sie einen tollen Blick \xfcber die Stadt</li> <li>Kleine Klasen mit max. 10 Teilnehmern</li> <li>Hervorragendes Preis- / Leistungsverh\xe4ltnis</li> </ul><p></p>\n</div>

Upvotes: 2

Views: 990

Answers (1)

Fernando Cezar
Fernando Cezar

Reputation: 848

That's because sellingPoints.find_next_siblings() is returning you a list, and lists don't have a findAll method.

First iterate the list, then use the findAll in its elements.

Upvotes: 3

Related Questions