Python: extract class and text

Question

I would like to extract data from a website, and I need to know if it contains some of the equipment. As the example below, I know A has CD, but he doesn't have CDA.

HTML:

My code:

res = requests.get('https://www.acd.com/carinfo-4434.php')
soup=BeautifulSoup(res.text,'lxml')
for item in soup.find_all(attrs={'class':'ABC'}):       
    for link in item.find_all('li'):
        print(link)

From my code, I will extract all the li from the HTML, like this:

CD
VCD
CDA 

    b11


    b22

But that's not what I want. What I wanna do, is to extract from "li class" and text, the hope the result will be like this:

specChecked, CD
specChecked, VCD
, CDA

(Or maybe I can just replace specChecked as 1 and blank space as 0)

Rakesh · Accepted Answer

s = """
    A
    
        CD
        VCD
        CDA                       
    
    B
    
        
        
            b11
        
        
            b22
        
        
    
"""
from bs4 import BeautifulSoup
soup = BeautifulSoup(s, "html.parser")
for link in soup.find_all('li'):
    if link.has_attr("class"):
        print(link.get("class", ""), link.text)

Output:

[u'specChecked'], u'CD'
[u'specChecked'], u'VCD'
[u''], u'CDA'

You can use has_attr to check if li has class attribute
link.get to get the class value
link.text to extract the text.

Python: extract class and text

Answers (2)

Related Questions