Reputation: 1128
I have an HTML document that uses the text-styling
style attribute to change case. When I see that style, I'd like to change all text for which that tag applies, retaining the HTML tags.
I have a partial solution that replaces the tag entirely. The approach that seems like it ought to be correct gives me AttributeError: 'NoneType' object has no attribute 'next_element'
Example:
from bs4 import BeautifulSoup, NavigableString, Tag
import re
html = '''
<div style="text-transform: uppercase;">
Foo0
<font>Foo0</font>
<div>Foo1
<div>Foo2</div>
</div>
</div>
'''
upper_patt = re.compile('(?i)text-transform:\s*uppercase')
# works, but replaces all text, removing the HTML tags
for node in soup.find_all(attrs={'style': upper_patt}):
node.replace_with(node.text.upper())
# does not work, throws AttributeError error
soup = BeautifulSoup(html, "html.parser")
for node in soup.find_all(attrs={'style': upper_patt}):
for txt in node.strings:
txt.replace_with(txt.upper())
Upvotes: 0
Views: 70
Reputation: 44172
Seems like you want to change the inner text to uppercase for all the children of an element with text-transform: uppercase
.
Instead of altering the result of find_all
, loop over the children text with node.findChildren(text=True)
of the result, and use replace_with()
to change the text:
from bs4 import BeautifulSoup, NavigableString, Tag
import re
html = '''
<div style="text-transform: uppercase;">
Foo0
<font>Foo0</font>
<div>Foo1
<div>Foo2</div>
</div>
</div>
'''
upper_patt = re.compile('(?i)text-transform:\s*uppercase')
soup = BeautifulSoup(html, "html.parser")
for node in soup.find_all(attrs={'style': upper_patt}):
for child in node.findChildren(recursive=True, text=True):
child.replace_with(child.text.upper())
print(soup)
Prints:
<div style="text-transform: uppercase;">
FOO0
<font>FOO0</font>
<div>FOO1
<div>FOO2</div>
</div>
</div>
Upvotes: 1