unstructured cannot find images

Question

I am trying to use the unstructured library to convert a word document into a json file. However, for some reason it is not seeing the images; in the list of elements that are returned there should be elements of type "Image". It is not throwing an error, it's just not returning the image elements. Below my code and my test file. The testfile contains a string, an image and another string. But the image is thus not detected. What am I doing wrong?

from unstructured.partition.docx import partition_docx
import os
# Set environment variables
os.environ['UNSTRUCTURED_API_KEY'] = "your unstructured.io api key"
os.environ['UNSTRUCTURED_API_URL'] = "https://api.unstructuredapp.io/general/v0/general"

elements = partition_docx(filename="input/test.docx")

with open("input/test.docx", "rb") as f:
    elements = partition_docx(file=f)
    elements = [element.to_dict() for element in elements]
    # save as json
    with open("output/test.json", "w") as f_json:
        json.dump(elements, f_json, indent=2)

My project structure:

├── root
│   └── input
│   └── output

Here's the file: test.docx

unstructured cannot find images

Answers (1)

Related Questions