serge
serge

Reputation: 15239

pypandoc does not keep images nor formatting when converting

I try to convert a .rtf file to .docx using pypandoc

import pypandoc

# Specify the input RTF file and output DOCX file
input_file = 'test.rtf'
output_file = 'test.docx'

# Convert the RTF file to DOCX
pypandoc.convert_file(input_file, 'docx', outputfile=output_file)

print(f"Conversion complete. The DOCX file is saved as {output_file}")

However, if I have some colors in the original file, or pictures, they are not keept in the resulting docx, I am missing some settings?

Package     Version
----------- -------
windows     10
python      3.11
cobble      0.1.4
lxml        5.3.0
mammoth     1.6.0
pip         23.2.1
pypandoc    1.13
python-docx 0.8.11
pywin32     306
setuptools  65.5.0

Upvotes: 0

Views: 176

Answers (1)

tarleb
tarleb

Reputation: 22659

The fifth paragraph of the pandoc user guide is probably the part that's cited the most:

Because pandoc’s intermediate representation of a document is less expressive than many of the formats it converts between, one should not expect perfect conversions between every format and every other. Pandoc attempts to preserve the structural elements of a document, but not formatting details such as margin size. And some document elements, such as complex tables, may not fit into pandoc’s simple document model. While conversions from pandoc’s Markdown to all formats aspire to be perfect, conversions from formats more expressive than pandoc’s Markdown can be expected to be lossy.

In other words: those colors won't come through, no matter the settings. Images should work in general, but that's hard to judge without seeing the actual document.

Upvotes: 0

Related Questions