Scraping string from HTML with python3-beautifulsoup3

Question

I'm trying to get string from a table row using beautifulsoup. String I want to get are 'SANDAL' and 'SHORTS', from second and third rows. I know this can be solved with regular expression or with string functions but I want to learn beautifulsoup and do as much as possible with beautifulsoup.

Clipped python code

    soup=beautifulsoup(page,'html.parser')
    table=soup.find('table')
    row=table.find_next('tr')
    row=row.find_next('tr')

HTML

    
    
    
    
    
    
    
    PRODUCT ID TYPE WHEN ID ID
    SANDAL 77313 wear new id 878717
    SHORTS 77314 wear new id 878718

Andrej Kesely · Accepted Answer

To get text from first column of the table (sans header), you can use this script:

from bs4 import BeautifulSoup


txt = '''
    
    
    
    

    

    PRODUCT ID TYPE WHEN ID ID
    SANDAL 77313 wear new id 878717
    SHORTS 77314 wear new id 878718

    

    
    
    
    '''

soup = BeautifulSoup(txt, 'lxml')  # <-- lxml is important here (to parse the HTML code correctly)

for tr in soup.find('table', id='products').find_all('tr')[1:]:  # <-- [1:] because we want to skip the header
    print(tr.td.text)                                            # <-- print contents of first  tag

Prints:

SANDAL
SHORTS

Scraping string from HTML with python3-beautifulsoup3

Answers (1)

Related Questions