Karthik
Karthik

Reputation: 2581

Python BeautifulSoup: Insert attribute to tags

I'm trying to insert a new attribute to all the nested tables in a html doc. I'm trying with the code below, but it's not inserting the attribute to all the table tags. I would really appreciate any help.

Input html code:

<html>
<head>
<title>Test</title>
</head>
<body>
<div>
<table>
<tr>t<td><table></table></td></tr>
<tr>t<td><table></table></td></tr>
<tr>t<td><table></table></td></tr>
</table>
</div>
</body>
</html>

Code:

from bs4 import BeautifulSoup
import urllib2

html = urllib2.urlopen("file://xxxxx.html").read()

soup = BeautifulSoup(html)

for tag in soup.find_all(True):
        if (tag.name == "table"):
                tag['attr'] = 'new'
                print(tag)
        else:
                print(tag.contents)

Output html code:

<html>
<head>
<title>Test</title>
</head>
<body>
<div>
<table attr="new">
<tr>t<td><table attr="new"></table></td></tr>
<tr>t<td><table attr="new"></table></td></tr>
<tr>t<td><table attr="newe"></table></td></tr>
</table>
</div>
</body>
</html>

Upvotes: 0

Views: 967

Answers (1)

200_success
200_success

Reputation: 7582

Your tag['attr'] = 'new' seems to work correctly. The problem is that print(tag.contents) will print parts of the document recursively before the descendants have been modified.

The simple fix is to make one pass to modify the document first, then make just one print(soup) call at the end.

Upvotes: 1

Related Questions