Reputation: 514
I have this piece of code:
txt = """<p>Hi <span>Mark</span>, how are you?, Don't forget meeting on <strong>sunday</strong>, ok?</p>"""
soup = BeautifulSoup(txt)
for ft in soup.findAll('p'):
print str(ft).upper()
When running I get this:
<P>HI <SPAN>MARK</SPAN>, HOW ARE YOU?, DON'T FORGET MEETING ON <STRONG>SUNDAY</STRONG>, OK?</P>
But I want to get this:
<p>HI <span>Mark</span>, HOW ARE YOU?, DON'T FORGET MEETING ON <strong>sunday<strong>, ok?</p>
I just want to change inner text on p tag but keep format in other inner tags inside p, also I want to keep tag names in lowercase
Thanx
Upvotes: 0
Views: 431
Reputation: 36282
You can assign the modified text to the string
attribute of the tag, p.string
. So loop over all contents of the <p>
tag and use the regular expression module to check if it contains the tag symbols <
and >
and skip them. Something like:
from bs4 import BeautifulSoup
import re
txt = """<p>Hi <span>Mark</span>, how are you?, Don't forget meeting on <strong>sunday</strong>, ok?</p>"""
soup = BeautifulSoup(txt)
for p in soup.find_all('p'):
p.string = ''.join(
[str(t).upper()
if not re.match(r'<[^>]+>', str(t))
else str(t)
for t in p.contents])
print soup.prettify(formatter=None)
I use the formatter
option to avoid the encoding of html
special symbols. It yields:
<html>
<body>
<p>
HI <span>Mark</span>, HOW ARE YOU?, DON'T FORGET MEETING ON <strong>sunday</strong>, OK?
</p>
</body>
</html>
Upvotes: 1