Yash
Yash

Reputation: 108

BeautifulSoup : create and insert self closing tag between a tag

I am parsing html files and replacing specific links with new tags.

Python Code:

from bs4 import BeautifulSoup
sample='''<a href="{Image src='https://google.com' link='https://google.com'}" >{Image src='https://google.com' link='google.com'}</a>'''
soup=BeautifulSoup(sample)
for a in soup.findAll('a'):
    x=BeautifulSoup('<ac:image><ri:attachment ri:filename="somefile"/> </ac:image>')
    a=a.replace_with(x)

print(soup)

Actual Output:

<ac:image><ri:attachment ri:filename="somefile"></ri:attachment> </ac:image>

Desired Output:

<ac:image><ri:attachment ri:filename="somefile" /></ac:image>

The self closing Tags are automatically getting converted. The destination strictly needs self closing tags.

Any help would be appreciated!

Upvotes: 1

Views: 1063

Answers (1)

Andrej Kesely
Andrej Kesely

Reputation: 195428

To get correct self closing tags, use parser xml when creating new soup that is going to replace old tag.

Also, to preserve ac and ri namespaces, xml parser requires to define xmlns:ac and xmlns:ri parameters. We define these parameters in a dummy tag that is removed after processing.

For example:

from bs4 import BeautifulSoup
import xml
txt = '''
<div class="my-class">
    <a src="some address">
        <img src="attlasian_logo.gif" />
    </a>
</div>
<div class="my-class">
    <a src="some address2">
        <img src="other_logo.gif" />
    </a>
</div>
'''

template = '''
<div class="_remove_me" xmlns:ac="http://namespace1/" xmlns:ri="http://namespace2/">
<ac:image>
  <ri:attachment ri:filename="{img_src}" />
</ac:image>
</div>
'''

soup = BeautifulSoup(txt, 'html.parser')

for a in soup.select('a'):
    a=a.replace_with(BeautifulSoup(template.format(img_src=a.img['src']), 'xml'))  # <-- select `xml` parser, the template needs to have xmlns:* parameters to preserve namespaces

for div in soup.select('div._remove_me'):
    dump=div.unwrap()

print(soup.prettify())

Prints:

<div class="my-class">
 <ac:image>
  <ri:attachment ri:filename="attlasian_logo.gif"/>
 </ac:image>
</div>
<div class="my-class">
 <ac:image>
  <ri:attachment ri:filename="other_logo.gif"/>
 </ac:image>
</div>

Upvotes: 1

Related Questions