Python BeautifulSoup - Add Tags around found keyword

Question

I am currently working on a project in which I want to allow regex search in/on a huge set of HTML files.

After first pinpointing the files of my interest I now want to highlight the found keyword!

Using BeautifulSoup I can determine the Node in which my keyword is found. One thing I do is changing the color of the whole parent.

However, I would also like to add my own -Tags around just they keyword(s) I found.

Determining the position and such is no big deal using the find()-functions provided by BFSoup. But adding my tags around regular text seems to be impossible?

# match = keyword found by another regex
# node = the node I found using the soup.find(text=myRE)
node.parent.setString(node.replace(match, ""+match+""))

This way I only add mere text and not a proper Tag, as the document is not freshly parsed, which I hope to avoid!

I hope my problem became a little clear :)

tzelleke · Accepted Answer

Here's a simple example showing one way to do it:

import re
from bs4 import BeautifulSoup as Soup

html = '''
This is a paragraph
'''

(1) store the text and empty the tag

soup = Soup(html)
text = soup.p.string
soup.p.clear()
print soup

(2) get start and end positions of the words to be boldened (apologies for my English)

match = re.search(r'\ba\b', text)
start, end = match.start(), match.end()

(3) split the text and add the first part

soup.p.append(text[:start])
print soup

(4) create a tag, add the relevant text to it and append it to the parent

b = soup.new_tag('b')
b.append(text[start:end])
soup.p.append(b)
print soup

(5) append the rest of the text

soup.p.append(text[end:])
print soup

here is the output from above:


This is 
This is a
This is a paragraph

Python BeautifulSoup - Add Tags around found keyword

Answers (2)

Related Questions