Reputation: 135
So I have a HTML document, where I want to add HTML anchor link tags so that I can easily go to a certain part of a webpage.
The first step is to find all divs that need to replaced. Secondly, an anchor link tag needs to be added, based on the text that is within the div
. My code looks as follows:
from bs4 import BeautifulSoup
path= "/text.html"
with open(path) as fp:
soup = BeautifulSoup(fp, 'html.parser')
mydivs = soup.find_all("p", {"class": "tussenkop"})
for div in mydivs:
if "Artikel" in div.getText():
string = div.getText().split()[1]
div_id = f"""<a id="{string}"></a>{div}"""
full =f"{div_id}{div}"
html_soup = BeautifulSoup(full, 'html.parser')
div = html_soup
A div looks as follows:
<p class="tussenkop"><strong class="tussenkop_vet">Artikel 7.37 text text text</strong></p>
After adding the anchor tag it becomes:
<a id="7.37"></a><p class="tussenkop"><strong class="tussenkop_vet">Artikel 10.6 Inwerkingtreding</strong></p><p class="tussenkop"><strong class="tussenkop_vet">Artikel 7.37 text text text</strong></p>
But the problem is, div
is not replaced by the new div
. How should I correct this? Or is there another way to insert an anchor tag?
Upvotes: 0
Views: 630
Reputation: 28630
I'm not quite sure what your expected output to look like, but BeautifulSoup has methods to create new tags and attributes, and insert them into the soup object.
from bs4 import BeautifulSoup
fp = '<p class="tussenkop"><strong class="tussenkop_vet">Artikel 7.37 text text text</strong>'
soup = BeautifulSoup(fp, 'html.parser')
print('soup before: ', soup)
mydivs = soup.find_all("p", {"class": "tussenkop"})
for div in mydivs:
if "Artikel" in div.getText():
a_string = div.getText().split()[1]
new_tag = soup.new_tag("a")
new_tag['id'] = f'{a_string}'
div.insert_before(new_tag)
print('soup after: ', soup)
Output:
soup before: <p class="tussenkop"><strong class="tussenkop_vet">Artikel 7.37 text text text</strong></p>
soup after: <a id="7.37"></a><p class="tussenkop"><strong class="tussenkop_vet">Artikel 7.37 text text text</strong></p>
Upvotes: 2