Reputation: 31
I'm trying to use the llama_index model which builds an index from your personal documents, and allows you to ask questions about the information from the GPT chat.
This is the full code (of course with my API):
import os
os.environ["OPENAI_API_KEY"] = 'YOUR_OPENAI_API_KEY'
from llama_index import GPTSimpleVectorIndex, SimpleDirectoryReader
documents = SimpleDirectoryReader('data').load_data()
index = GPTSimpleVectorIndex.from_documents(documents)
When I run the index build according to the steps in their documentation, it fails at this step:
index = GPTSimpleVectorIndex.from_documents(documents)
with the following error:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Users\COLMI\AppData\Local\Programs\Python\Python310\lib\site-packages\llama_index\indices\base.py", line 92, in from_documents
service_context = service_context or ServiceContext.from_defaults()
File "C:\Users\COLMI\AppData\Local\Programs\Python\Python310\lib\site-packages\llama_index\indices\service_context.py", line 71, in from_defaults
embed_model = embed_model or OpenAIEmbedding()
File "C:\Users\COLMI\AppData\Local\Programs\Python\Python310\lib\site-packages\llama_index\embeddings\openai.py", line 209, in __init__
super().__init__(**kwargs)
File "C:\Users\COLMI\AppData\Local\Programs\Python\Python310\lib\site-packages\llama_index\embeddings\base.py", line 55, in __init__
self._tokenizer: Callable = globals_helper.tokenizer
File "C:\Users\COLMI\AppData\Local\Programs\Python\Python310\lib\site-packages\llama_index\utils.py", line 50, in tokenizer
enc = tiktoken.get_encoding("gpt2")
File "C:\Users\COLMI\AppData\Local\Programs\Python\Python310\lib\site-packages\tiktoken\registry.py", line 63, in get_encoding
enc = Encoding(**constructor())
File "C:\Users\COLMI\AppData\Local\Programs\Python\Python310\lib\site-packages\tiktoken_ext\openai_public.py", line 11, in gpt2
mergeable_ranks = data_gym_to_mergeable_bpe_ranks(
File "C:\Users\COLMI\AppData\Local\Programs\Python\Python310\lib\site-packages\tiktoken\load.py", line 83, in data_gym_to_mergeable_bpe_ranks
for first, second in bpe_merges:
ValueError: not enough values to unpack (expected 2, got 1)
I should mention that I tried this on DOCX files inside a specific folder that contains such files and folders, also inside subfolders.
Upvotes: 0
Views: 6970
Reputation: 31
I seem to have had a problem with the whole code usage approach.
The value 'data' is not used as a parameter for defining a function, but simply marks an example of a folder name that contains the user's files.
A local path can be used like:
documents = SimpleDirectoryReader('my_folder').load_data()
or in a fixed path, such as:
documents = SimpleDirectoryReader('c:\users\user\my_files').load_data()
If you use this approach, everything will work as expected.
Upvotes: 1
Reputation: 21
You must set a recursive argument to True, if your files are in subfolders:
documents = SimpleDirectoryReader('documents', recursive=True).load_data()
Upvotes: 2