Mazze
Mazze

Reputation: 453

LlamaIndex library not respecting LLAMA_INDEX_CACHE_DIR environment variable

I'm using the LlamaIndex library in my Python project to handle some data processing tasks. According to the documentation (Link), I can control the location where additional data is downloaded by setting the LLAMA_INDEX_CACHE_DIR environment variable. However, despite setting this environment variable, the LlamaIndex library seems to ignore it and continues to store data in a different location.

Here's how I'm setting the environment variable in my Python script:

import os

os.environ["LLAMA_INDEX_CACHE_DIR"] = "/path/to/my/cache/directory"

When creating the index storage (see code below), nltk_data gets downloaded to /Users/user/nltk_data instead of the path I set in as the environment variable.

loader = UnstructuredReader()
doc = loader.load_data(file=Path(file), split_documents=False)
storage_context = StorageContext.from_defaults()
cur_index = VectorStoreIndex.from_documents(doc, storage_context=storage_context)
storage_context.persist(persist_dir=f"./storage/name")
    

I've checked for typos, ensured correct permissions on the cache directory, and set the environment variable before importing the LlamaIndex library, but the issue persists.

Could anyone suggest why LlamaIndex might not be respecting the LLAMA_INDEX_CACHE_DIR environment variable, and how I can troubleshoot or resolve this issue?

Any insights or suggestions would be greatly appreciated. Thank you!

Upvotes: 0

Views: 454

Answers (1)

teh_loraxx
teh_loraxx

Reputation: 1

I just had the same issue. What worked for me was setting TIKTOKEN_CACHE_DIR instead of LLAMA_INDEX_CACHE_DIR.

In my case llama_index was loading some other library called tiktoken:

enc = tiktoken.get_encoding("gpt2")

And then that checks these two environment variables instead of LLAMA_INDEX_CACHE_DIR:

if "TIKTOKEN_CACHE_DIR" in os.environ:
    cache_dir = os.environ["TIKTOKEN_CACHE_DIR"]
elif "DATA_GYM_CACHE_DIR" in os.environ:
    cache_dir = os.environ["DATA_GYM_CACHE_DIR"]

Upvotes: 0

Related Questions