Ken
Ken

Reputation: 61

Replace text in td using BeautifulSoup

I am trying to parse out a table and edit the text in the cell according to what I have in a QtableWidget while keeping as much as the original style as possible. The formatting of the td entries are inconsistent, and I can't seem to figure out a good way to properly edit the text to my expectation.

soup = BeautifulSoup(body, "html.parser")
tables = soup.find_all('table')
for table in tables:
    rows = table.find_all('tr')
    for r, row in enumerate(rows[1:]):
        cols = row.find_all('td')
        for c, ele in enumerate(cols):
            print(ele)
            #If i directly do 
            #ele.string = tableWidget.item(r, c).text()
            #the text transforms to my expectation, but it loses all the styling like the hyperlink
            #e.g. I lose the entire <p> tag here <td style="padding:.75pt .75pt .75pt .75pt">07:00</td>
            
            #If i try the below, it doesn't work. ele.has_attr('p') is always false for some reason even though the td has <p> tag
            if ele.has_attr('p'):    
                if not ele['p'].has_attr('a'):
                    ele[p].string = tableWidget.item(r, c).text()
            else:
                ele.string = tableWidget.item(r, c).text()

below is output of ele, ** inclosed texts are what I am trying to replace

<td style="padding:.75pt .75pt .75pt .75pt"><p class="MsoNormal"><span style='font-family:"Calibri",sans-serif'><a href="https:/random link" target="_blank">**test**</a><o:p></o:p></span></p></td>
<td style="padding:.75pt .75pt .75pt .75pt"><p class="MsoNormal"><span style='font-family:"Calibri",sans-serif'>**07:00**<o:p></o:p></span></p></td>
<td style="padding:.75pt .75pt .75pt .75pt"><p class="MsoNormal"><span style='font-family:"Calibri",sans-serif'>**08:00**<o:p></o:p></span></p></td>
<td style="padding:.75pt .75pt .75pt .75pt">****</td>

Upvotes: 0

Views: 763

Answers (2)

chitown88
chitown88

Reputation: 28565

Seems like you understand a and p are tags:

"it doesn't work. ele.has_attr('p') is always false for some reason even though the td has <p> tag"

but then you are using the .has_attr()

p and a are not an attributes. Those are element tags. So you want to check if the <td> has a tag <p> within it, and if that <p> tag has an <a> tag.

So remove:

if ele.has_attr('p'):    
                if not ele['p'].has_attr('a'):

and replace with:

if not ele['p'].has_attr('a'):

Upvotes: 0

folen gateis
folen gateis

Reputation: 2012

you're very confused: td is a tag and p is a tag too. p is not an attribute of td. style is an attribute of td. class is an attribute of p

you can use dot notation to chain find like this

    if ele.p:    
        if not ele.p.a:
            ele.p.string = tableWidget.item(r, c).text()
    else:
        ele.string = tableWidget.item(r, c).text()

Upvotes: 1

Related Questions