Issue with Storing and Loading Index Timescale Vector Llama Index

Question

I'm currently working with the llama_index Python package and using the llama-index-vector-stores-timescalevector extension to manage my vectors with Timescale. However, I’ve encountered an issue where I’m unable to store the index for future use, which means I have to recreate it every time I run my code. This is quite inefficient and not ideal for my use case.

I followed this tutorial: TimescaleVector Example, but it doesn't mention how to store and later load the index.

Here’s a snippet of my code setup. The csv is available at this link

pip install llama_index llama-index-vector-stores-postgres llama-index-embeddings-openai llama-index-vector-stores-timescalevector

import llama_index
from llama_index.core import SimpleDirectoryReader, StorageContext, VectorStoreIndex
from llama_index.core.vector_stores import VectorStoreQuery, MetadataFilters
from llama_index.core.schema import TextNode, NodeRelationship, RelatedNodeInfo
from llama_index.vector_stores.timescalevector import TimescaleVectorStore
from llama_index.embeddings.openai import OpenAIEmbedding
import pandas as pd
import os
import time
from datetime import datetime, timedelta

# API keys and paths hidden for security
os.environ["OPENAI_API_KEY"] = 'your_openai_api_key'
os.environ["TIMESCALE_SERVICE_URL"] = 'your_timescale_service_url'

# Load and process data
reuters = pd.read_csv('your_file_path')
reuters.columns = ["title", "date", "description"]

# Function to take in a date string in the past and return a uuid v1
def create_uuid2(date_string: str):
    if date_string is None:
        return None
    time_format = '%b %d %Y'
    datetime_obj = datetime.strptime(date_string, time_format)
    uuid = timescale_client.uuid_from_time(datetime_obj)
    return str(uuid)

def create_date2(input_string: str) -> datetime:
    if input_string is None:
        return None
    # Convert the string to a datetime object using strptime
    date_object = datetime.strptime(input_string, '%b %d %Y')

    # Define the time as midnight and the desired timezone offset
    time = "00:00:00"
    timezone_hours = 8
    timezone_minutes = 50

    # Create the formatted string
    timestamp_tz_str = f"{date_object.year}-{date_object.month:02}-{date_object.day:02} {time}+{timezone_hours:02}{timezone_minutes:02}"
    return timestamp_tz_str



# Create a Node object from a single row of data
def create_node2(row):
    record = row.to_dict()
    record_content = (
        record["date"]
        + " "
        + record["title"]
        + " "
        + record["description"]
    )
    # Can change to TextNode as needed
    node = TextNode(
        id_=create_uuid2(str(record["date"])),
        text=record_content,
        metadata={
            "title": record["title"],
            "date": create_date2(str(record["date"])),
        },
    )

    return node


# Create nodes and embeddings
nodes = [create_node2(row) for _, row in reuters.iterrows()]
embedding_model = OpenAIEmbedding()

# Add nodes to Timescale Vector Store
ts_vector_store = TimescaleVectorStore.from_params(
    service_url=os.environ["TIMESCALE_SERVICE_URL"],
    table_name="reuters_test"
)
_ = ts_vector_store.add(nodes[:100])

# Tried with this function. It runs but I don't know where the index is saved
ts_vector_store.create_index("aaa")
# Also, attempt to store the index (currently not working as expected)
storage_context = StorageContext.from_defaults(persist_dir="your_persist_dir")
index.storage_context.persist(persist_dir="your_persist_dir") #not clear how to retrieve the index variable

from llama_index.core import load_index_from_storage

# load a single index
# need to specify index_id if multiple indexes are persisted to the same directory
index = load_index_from_storage(storage_context)

This is the error that I am getting when using the function load_index_from_storage

KeyError                                  Traceback (most recent call last)
 in ()
      4     load_graph_from_storage,
      5 )
----> 6 index = load_index_from_storage(storage_context)

4 frames
/usr/local/lib/python3.10/dist-packages/llama_index/core/storage/storage_context.py in vector_store(self)
    262     def vector_store(self) -> BasePydanticVectorStore:
    263         """Backwrds compatibility for vector_store property."""
--> 264         return self.vector_stores[DEFAULT_VECTOR_STORE]
    265 
    266     def add_vector_store(

KeyError: 'default'

Does anyone have experience with the llama-index-vector-stores-timescalevector package? How can I properly store and reload the index to avoid having to recreate it each time? Any guidance on the correct method or any relevant documentation would be greatly appreciated.

I expected to be able to store the index and later reload it without needing to recreate it from scratch.

Issue with Storing and Loading Index Timescale Vector Llama Index

Answers (1)

Related Questions