Reputation: 15682
I want to do a very simple bit of manipulation of a LibreOffice Writer document... then save again as the ODT file...
What might be wrong with this? If I try this I get 2 content.xmls in the zip file (ODT file)... strangely, both these (if unzipped as "content.xml" and "content_1.xml" for example) seem to contain the content as modified...
zipfile = ZipFile( file_path, "a" )
for zip_info in zipfile.infolist():
contents = zipfile.read( zip_info.filename )
if ( zip_info.filename == "content.xml" ):
document_root = parseString( contents )
# ... mess around with the contents DOM document...
zipfile.writestr( zip_info, document_root.toxml() )
zipfile.close()
I'm aware that there are various add-ins and things you can use (UNO)... but I want to keep it as simple as possible...
later
my solution: finding that there is no way to delete an element from a zip file programmatically in Python, I initially decided to take the "make a new zip" approach: Delete file from zipfile with the ZipFile Module
however, although I was able to open the resulting ODT file, and to extract all the files from it, 7Zip complained about a CRC failure, saying content.xml was now "broken". Obviously due to this brutal substitution of one "content.xml" for another.
final answer: 1) output modified DOM structure to a simple file in the same directory, calling it "content.xml":
f = open( file_dir + '\\content.xml', "w" )
print >>f, document_root.toxml()
f.close()
2) harness 7zip CLI when the ODT file has been closed programmatically:
import subprocess
subprocess.Popen( "7z u temp.odt content.xml", cwd=file_dir, shell=True )
Upvotes: 3
Views: 1391
Reputation: 4331
Depending on where the document(s) is(are) sourced from, you might want to skip messing around with the zip file and use the Flat XML OpenDocument Format (I believe it's .fodt extensions) and just manipulate the XML directly. It will mean larger file sizes, but they do compress rather well and you can always save them as .odt files when you've finished messing around with them.
Upvotes: 1