Reputation: 1233
Given the object soup
with a value bs4.BeautifulSoup("<tr><td>Hello!</td><td>World!</td></tr>")
, how do I remove exclamation marks from all tr
tags?
The closest I have got is:
for tr in soup.find_all("tr"):
tr.string = tr.decode_contents().replace("!", "")
But this results in:
<html><body><tr><td>Hello</td><td>World</td></tr></body></html>
Where the angle brackets in decode_contents()
are encoded when assigned to tr.string
.
I have also tried tr.replace_with(str(tr).replace("!", ""))
(using the HTML representation of Tag
objects) which gives the same result.
Bear in mind this is a simplified example. While I could iterate over the td
tags instead in this specific example, in reality those tags would also contain HTML structures, presenting the same problem.
Upvotes: 1
Views: 2062
Reputation: 1233
Did the following:
import bs4
soup = bs4.BeautifulSoup("<tr><td>Hello!</td><td>World!</td></tr>", "html.parser")
for tr in soup.find_all("tr"):
replaced_tr = str(tr).replace("!", "")
modified_tr = bs4.BeautifulSoup(replaced_tr, "html.parser").tr
tr.replace_with(modified_tr)
It seems replace_with
does not work with strings of HTML, so you should create a BeautifulSoup
object first and use that as the argument of replace_with
Upvotes: 0
Reputation: 3518
You could try iterating through all the string objects that are children of <tr>
.
import bs4
soup = bs4.BeautifulSoup("<table><tr><td>Hello!</td><td>World!</td></tr></table>")
for tr in soup.find_all("tr"):
strings = list(tr.strings)
for s in strings:
new_str = s.replace("!", "")
s.replace_with(new_str)
One issue is that you can't replace the strings returned by .strings
without breaking the iterator, which is why I made it a list first. If that's an issue you could iterate in a way that preserves the next element before you replace it, like so:
def iter_strings(elem):
# iterate strings so that they can be replaced
iter = elem.strings
n = next(iter, None)
while n is not None:
current = n
n = next(iter, None)
yield current
def replace_strings(element, substring, newstring):
# replace all found `substring`'s with newstring
for string in iter_strings(element):
new_str = string.replace(substring, newstring)
string.replace_with(new_str)
for tr in soup.find_all("tr"):
replace_strings(soup, "!", "")
Upvotes: 2