user2966197
user2966197

Reputation: 2991

llama-index unstructured simple directory reader not working

I am trying to use Unstrcutred.io version of llama-index as defined here

I have a pdf file and a html file in my data directory and when I execute, I get following error -

File "main.py", line 199, in lddataV2
    SimpleDirectoryReader = download_loader("SimpleDirectoryReader")
                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/.../venv/lib/python3.11/site-packages/llama_index/readers/download.py", line 211, in download_loader
    spec.loader.exec_module(module)  # type: ignore
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<frozen importlib._bootstrap_external>", line 936, in exec_module
  File "<frozen importlib._bootstrap_external>", line 1073, in get_code
  File "<frozen importlib._bootstrap_external>", line 1130, in get_data
FileNotFoundError: [Errno 2] No such file or directory: '/.../venv/lib/python3.11/site-packages/llama_index/readers/llamahub_modules/file/base.py'

Here is my code:

SimpleDirectoryReader = download_loader("SimpleDirectoryReader")

    loader = SimpleDirectoryReader('output_html', file_extractor={
        ".pdf": "UnstructuredReader",
        ".html": "UnstructuredReader"
    })
    documents = loader.load_data()

My llama-index version 0.6.2 and python 3.11

Upvotes: 1

Views: 3625

Answers (1)

Try this way

from llama_index import download_loader, SimpleDirectoryReader

UnstructuredReader = download_loader('UnstructuredReader')

dir_reader = SimpleDirectoryReader('./example2', file_extractor={
  ".pdf": UnstructuredReader(),
  ".html": UnstructuredReader(),
})
documents = dir_reader.load_data()

print(documents)

Upvotes: 1

Related Questions