Reputation: 2581
I'm trying to insert a new attribute to all the nested tables in a html doc. I'm trying with the code below, but it's not inserting the attribute to all the table tags. I would really appreciate any help.
Input html code:
<html>
<head>
<title>Test</title>
</head>
<body>
<div>
<table>
<tr>t<td><table></table></td></tr>
<tr>t<td><table></table></td></tr>
<tr>t<td><table></table></td></tr>
</table>
</div>
</body>
</html>
Code:
from bs4 import BeautifulSoup
import urllib2
html = urllib2.urlopen("file://xxxxx.html").read()
soup = BeautifulSoup(html)
for tag in soup.find_all(True):
if (tag.name == "table"):
tag['attr'] = 'new'
print(tag)
else:
print(tag.contents)
Output html code:
<html>
<head>
<title>Test</title>
</head>
<body>
<div>
<table attr="new">
<tr>t<td><table attr="new"></table></td></tr>
<tr>t<td><table attr="new"></table></td></tr>
<tr>t<td><table attr="newe"></table></td></tr>
</table>
</div>
</body>
</html>
Upvotes: 0
Views: 967
Reputation: 7582
Your tag['attr'] = 'new'
seems to work correctly. The problem is that print(tag.contents)
will print parts of the document recursively before the descendants have been modified.
The simple fix is to make one pass to modify the document first, then make just one print(soup)
call at the end.
Upvotes: 1