Reputation: 3822
I am converting tag to confluence macro using bs4
input:
<p>
<img src="path/to/file.jpg" />
</p>
expected output:
<p>
<ac:image ac:align="center" ac:layout="center">
<ri:attachment ri:filename="file.jpg" ri:version-at-save="1" />
</ac:image>
</p>
Here's the function that's being used to achieve it
def transform_img_to_confluence(soup):
def get_image_tag(image_name):
return BeautifulSoup(textwrap.dedent('''
<ac:image ac:align="center" ac:layout="center">
<ri:attachment ri:filename="{}" ri:version-at-save="1" />
</ac:image>
''').format(image_name), "html.parser")
for img in soup.find_all('img'):
path = img['src']
image_name = os.path.basename(path)
image_tag = get_image_tag(image_name)
img.replace_with(image_tag)
soup = BeautifulSoup(html_string, "html.parser")
transform_img_to_confluence(soup)
print(soup.prettify())
When I inspect soup after calling this function, I'm expecting tags to be replaced to the following ( ri:attachment
element as inline )
<ac:image ac:align="center" ac:layout="center">
<ri:attachment ri:filename="file.jpg" ri:version-at-save="1" />
</ac:image>
but unfortunately I'm getting this instead. ( ri:attachment
with open and close tags )
<ac:image ac:align="center" ac:layout="center">
<ri:attachment ri:filename="file.jpg" ri:version-at-save="1">
</ri:attachment>
</ac:image>
How to make sure I'm getting the desired inline element ?
Upvotes: 0
Views: 525
Reputation: 2012
the problem comes from the parser. in your case lxml-xml
should do. here are some example outputs from available parsers
from bs4 import BeautifulSoup
a="""<a><b /></a>"""
print(BeautifulSoup(a, 'lxml'))
>>> <html><body><a><b></b></a></body></html>
print(BeautifulSoup(a, 'lxml-xml'))
>>> <?xml version="1.0" encoding="utf-8"?>
>>> <a><b/></a>
print(BeautifulSoup(a, 'html.parser'))
>>> <a><b></b></a>
print(BeautifulSoup(a, 'html5lib'))
>>> <html><head></head><body><a><b></b></a></body></html>
Upvotes: 1