Reputation: 23616
Say that I have this piece of HTML:
<p>This text is my <a href="#">text</a><p>
How do I replace the first "text" with an anchor element, so the result becomes:
<p>This <a href="#">text</a> is my <a href="#">text</a><p>
I basically want to replace a substring in a NavigableString with a Tag.
Upvotes: 11
Views: 15886
Reputation: 431
You can get the text of NavigableString, modify it, build new object model from modified text and then replace old NavigableString with this object model:
data = '<p>This text is my <a href="#">text</a><p>'
soup = BeautifulSoup(data)
original_string = soup.p.contents[0]
new_text = unicode(original_string).replace(' text ', '<a href="#">text</a>')
original_string.replaceWith(BeautifulSoup(text))
Upvotes: 5
Reputation: 4164
Your question has two parts:
Turning the single NavigableString "This text is my" into a NavigableString, a Tag, and another NavigableString.
Replacing the NavigableString "This text is my" with the three new elements.
The answer to #1 depends on your situation. Specifically it depends on how you determine what part of the text needs linking. I'll use a regular expression to find the string "text":
from bs4 import BeautifulSoup
data = '<p>This text is my <a href="#">text</a><p>'
soup = BeautifulSoup(data)
original_string = soup.p.contents[0]
print(original_string)
# "This text is my "
import re
this, text, is_my = re.compile("(text)").split(original_string)
Now for #2. This is not as easy as it could be, but it's definitely possible. First, Turn text
into a Tag
containing the link text:
text_link = soup.new_tag("a", href="#")
text_link.string = text
re.split()
turned this
and is_my
into ordinary Unicode strings. Turn them back into NavigableString
s so they can go back into the tree as elements:
this = soup.new_string(this)
is_my = soup.new_string(is_my)
Now use replace_with()
and insert_after
to replace the old element with the three new elements:
original_string.replace_with(this)
this.insert_after(text_link)
text_link.insert_after(is_my)
Now your tree should look the way you want it to:
print(soup.p)
# <p>This <a href="#">text</a> is my <a href=""></a></p>
Upvotes: 14