LlamaIndex library not respecting LLAMA_INDEX_CACHE_DIR environment variable

Question

I'm using the LlamaIndex library in my Python project to handle some data processing tasks. According to the documentation (Link), I can control the location where additional data is downloaded by setting the LLAMA_INDEX_CACHE_DIR environment variable. However, despite setting this environment variable, the LlamaIndex library seems to ignore it and continues to store data in a different location.

Here's how I'm setting the environment variable in my Python script:

import os

os.environ["LLAMA_INDEX_CACHE_DIR"] = "/path/to/my/cache/directory"

When creating the index storage (see code below), nltk_data gets downloaded to /Users/user/nltk_data instead of the path I set in as the environment variable.

loader = UnstructuredReader()
doc = loader.load_data(file=Path(file), split_documents=False)
storage_context = StorageContext.from_defaults()
cur_index = VectorStoreIndex.from_documents(doc, storage_context=storage_context)
storage_context.persist(persist_dir=f"./storage/name")

I've checked for typos, ensured correct permissions on the cache directory, and set the environment variable before importing the LlamaIndex library, but the issue persists.

Could anyone suggest why LlamaIndex might not be respecting the LLAMA_INDEX_CACHE_DIR environment variable, and how I can troubleshoot or resolve this issue?

Any insights or suggestions would be greatly appreciated. Thank you!

LlamaIndex library not respecting LLAMA_INDEX_CACHE_DIR environment variable

Answers (1)

Related Questions