Change content only to parent element text, in Beautifulsoup

Question

I have this piece of code:

txt = """Hi Mark, how are you?, Don't forget meeting on sunday, ok?"""
soup = BeautifulSoup(txt)
for ft in soup.findAll('p'):
        print str(ft).upper()

When running I get this:

HI MARK, HOW ARE YOU?, DON'T FORGET MEETING ON SUNDAY, OK?

But I want to get this:

HI Mark, HOW ARE YOU?, DON'T FORGET MEETING ON sunday, ok?

I just want to change inner text on p tag but keep format in other inner tags inside p, also I want to keep tag names in lowercase

Thanx

Birei · Accepted Answer

You can assign the modified text to the string attribute of the tag, p.string. So loop over all contents of the

tag and use the regular expression module to check if it contains the tag symbols < and > and skip them. Something like:

from bs4 import BeautifulSoup
import re

txt = """Hi Mark, how are you?, Don't forget meeting on sunday, ok?"""
soup = BeautifulSoup(txt)
for p in soup.find_all('p'):
    p.string = ''.join(
        [str(t).upper()
            if not re.match(r'<[^>]+>', str(t))
            else str(t)
            for t in p.contents])

print soup.prettify(formatter=None)

I use the formatter option to avoid the encoding of html special symbols. It yields:


 
  
   HI Mark, HOW ARE YOU?, DON'T FORGET MEETING ON sunday, OK?

Change content only to parent element text, in Beautifulsoup

Answers (1)

Related Questions