Reputation: 2991
I am trying to use Unstrcutred.io
version of llama-index
as defined here
I have a pdf
file and a html
file in my data directory and when I execute, I get following error -
File "main.py", line 199, in lddataV2
SimpleDirectoryReader = download_loader("SimpleDirectoryReader")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/.../venv/lib/python3.11/site-packages/llama_index/readers/download.py", line 211, in download_loader
spec.loader.exec_module(module) # type: ignore
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "<frozen importlib._bootstrap_external>", line 936, in exec_module
File "<frozen importlib._bootstrap_external>", line 1073, in get_code
File "<frozen importlib._bootstrap_external>", line 1130, in get_data
FileNotFoundError: [Errno 2] No such file or directory: '/.../venv/lib/python3.11/site-packages/llama_index/readers/llamahub_modules/file/base.py'
Here is my code:
SimpleDirectoryReader = download_loader("SimpleDirectoryReader")
loader = SimpleDirectoryReader('output_html', file_extractor={
".pdf": "UnstructuredReader",
".html": "UnstructuredReader"
})
documents = loader.load_data()
My llama-index
version 0.6.2
and python 3.11
Upvotes: 1
Views: 3625
Reputation: 41
Try this way
from llama_index import download_loader, SimpleDirectoryReader
UnstructuredReader = download_loader('UnstructuredReader')
dir_reader = SimpleDirectoryReader('./example2', file_extractor={
".pdf": UnstructuredReader(),
".html": UnstructuredReader(),
})
documents = dir_reader.load_data()
print(documents)
Upvotes: 1