Loop through BeautifulSoup list and parse each to HTML tags and data problem

Question

Python 3 programmer, new to BeautifulSoup and HTMLParser. I'm using BeautifulSoup to fetch all the definition list data from an HTML file, and try to store dt data and dd data into python dictionary as key value pairs correspondingly. My HTML file (List_page.html) is:



STH here

    
    
        Sine
        The ratio of the length of the opposite side to the length of the hypotenuse.
        Cosine
        The ratio of the length of the adjacent side to the length of the hypotenuse.

whereas when my Python code is:

from bs4 import BeautifulSoup
from html.parser import HTMLParser

dt = []
dd = []
dl = {}

class DTParser(HTMLParser):
    def handle_data(self, data):
        dt.append(data)

class DDParser(HTMLParser):
    def handle_data(self, data):
        dd.append(data)

html_page = open("List_page.html")
soup = BeautifulSoup(html_page, features="lxml")

dts = soup.select("dt")
parser = DTParser()

# Start of part 1:
parser.feed(str(dts[0]).replace('
', ''))
parser.feed(str(dts[1]).replace('
', ''))
# end of part 1

dds = soup.select("dd")
parser = DDParser()

# Start of part 2
parser.feed(str(dds[0]).replace('
', ''))
parser.feed(str(dds[1]).replace('
', ''))
# End of part 2

dl = dict(zip(dt, dd))
print(dl)

output is:

This outputs the stuff correctly as expected. However, when I replace part 1 (or 2) with for loop, it starts to go wrong:

for example, code:

# Similar change for part 2
for dt in dts:
    parser.feed(str(dts[0]).replace('
', ''))

in this case only tells me the definition of Cosine, not Sine. With 2 items, I can do this without a loop. But what if I got more items? So want to know a correct way to do this. Thanks.

alec · Accepted Answer

You are getting the first element of dts in the for loop each iteration with dts[0] instead of updating the index with the loop. Change it to:

for i in range(len(dts)):
    parser.feed(str(dts[i]).replace('
', ''))

and

for i in range(len(dds)):
    parser.feed(str(dds[i]).replace('
', ''))

Loop through BeautifulSoup list and parse each to HTML tags and data problem

Answers (1)

Related Questions