Nomura Nori
Nomura Nori

Reputation: 5187

python docx2txt extract images without order

I am using docx2txt to extract images in docx file docx file has multiple images, and all are extracted but order is not same as in docx. For example, it extract images with image1.png, image2.png, image3.png (names) But actually, image3.png is very top image in docx so it should be named image1.png. Is there any option to extract images and name it as ordered in docx?

Upvotes: 0

Views: 293

Answers (1)

Y.C.
Y.C.

Reputation: 201

I looked through the source code of the library named docx2txt and couldn't find a code block where it renames image files. I guess it's the text editor you're using that names the images that way. I used "Microsoft Word 2013" in all the tests and I always saw that it numbered the images according to the order in the document.

As far as I understand, docx files are created by zipping xml and media (image, video etc.) files together. There may be software like Microsoft Word that name the files in the zip. Maybe you are processing docx files created with a different version or other software. The software you are using may be naming the newly added file directly with the last number instead of renaming all the images when a new media is added.

Upvotes: 0

Related Questions