Peter
Peter

Reputation: 1128

BS4 - Replacing text content, preserving tags

I have an HTML document that uses the text-styling style attribute to change case. When I see that style, I'd like to change all text for which that tag applies, retaining the HTML tags.

I have a partial solution that replaces the tag entirely. The approach that seems like it ought to be correct gives me AttributeError: 'NoneType' object has no attribute 'next_element'

Example:

from bs4 import BeautifulSoup, NavigableString, Tag
import re

html = '''
<div style="text-transform: uppercase;">
    Foo0
    <font>Foo0</font>
    <div>Foo1
        <div>Foo2</div>
    </div>
</div>
'''
upper_patt = re.compile('(?i)text-transform:\s*uppercase')

# works, but replaces all text, removing the HTML tags
for node in soup.find_all(attrs={'style': upper_patt}):
    node.replace_with(node.text.upper())

# does not work, throws AttributeError error
soup = BeautifulSoup(html, "html.parser")
for node in soup.find_all(attrs={'style': upper_patt}):
    for txt in node.strings:
        txt.replace_with(txt.upper())

Upvotes: 0

Views: 70

Answers (1)

0stone0
0stone0

Reputation: 44172

Seems like you want to change the inner text to uppercase for all the children of an element with text-transform: uppercase.

Instead of altering the result of find_all, loop over the children text with node.findChildren(text=True) of the result, and use replace_with() to change the text:

from bs4 import BeautifulSoup, NavigableString, Tag
import re

html = '''
<div style="text-transform: uppercase;">
    Foo0
    <font>Foo0</font>
    <div>Foo1
        <div>Foo2</div>
    </div>
</div>
'''
upper_patt = re.compile('(?i)text-transform:\s*uppercase')
soup = BeautifulSoup(html, "html.parser")

for node in soup.find_all(attrs={'style': upper_patt}):
    for child in node.findChildren(recursive=True, text=True):
        child.replace_with(child.text.upper())

print(soup)

Prints:

<div style="text-transform: uppercase;">
    FOO0
    <font>FOO0</font>
<div>FOO1
        <div>FOO2</div>
</div>
</div>

Upvotes: 1

Related Questions