Reputation: 2991
I have this code where I am able to create an index in Opensearch Elasticsearch:
def openes_initiate(file):
endpoint = getenv("OPENSEARCH_ENDPOINT", "http://localhost:9200")
# index to demonstrate the VectorStore impl
idx = getenv("OPENSEARCH_INDEX", "llama-osindex-demo")
UnstructuredReader = download_loader("UnstructuredReader")
loader = UnstructuredReader()
documents = loader.load_data(file=Path(file))
# OpensearchVectorClient stores text in this field by default
text_field = "content"
# OpensearchVectorClient stores embeddings in this field by default
embedding_field = "embedding"
# OpensearchVectorClient encapsulates logic for a
# single opensearch index with vector search enabled
client = OpensearchVectorClient(endpoint, idx, 1536, embedding_field=embedding_field, text_field=text_field)
# initialize vector store
vector_store = OpensearchVectorStore(client)
storage_context = StorageContext.from_defaults(vector_store=vector_store)
# initialize an index using our sample data and the client we just created
index = GPTVectorStoreIndex.from_documents(documents=documents,storage_context=storage_context)
Issue I am having is that once I have indexed the data, I am unable to reload it and serve a query against it. I tried to do this:
def query(index,question):
query_engine = index.as_query_engine()
res = query_engine.query(question)
print(res.response)
Where index
is the one I created in first piece of code, but it returns None
Upvotes: 2
Views: 1858
Reputation: 45
Assuming you have initiated an OpenSearch dashboard, this is how you would typically go with the loading:
To initialize your index:
# using default values
endpoint = f"https://{user}:{password}@{hostname}"
idx = "sample-index"
text_field = "content"
embedding_field = "embedding"
client = OpensearchVectorClient(
endpoint, idx, dim=1536, embedding_field=embedding_field, text_field=text_field
)
This initiates an empty index in your OpenSearch Database.
To store document embeddings in the index, use:
vector_store = OpensearchVectorStore(client)
storage_context = StorageContext.from_defaults(vector_store=vector_store)
# using a simple VectorStoreIndex
index = VectorStoreIndex.from_documents(
documents=documents, storage_context=storage_context
)
This will populate your index.
Now, to load the contents from the populated index:
vector_index = VectorStoreIndex.from_vector_store(
vector_store = vector_store
)
This will load the contents from the vector store into your index from where you'll be able to use it as a query engine, retriever or chat engine.
Upvotes: 0
Reputation: 529
I think when storing you should use something like this:
service_context = ServiceContext.from_defaults(
llm=None,
embed_model= your_embedding_model
)
index = VectorStoreIndex.from_documents(
documents=documents,
storage_context=storage_context,
service_context=service_context
)
After your data has been embedded, to retrieve it, you'll need to get the vector store from the OpensearchVectorClient
. Here's a snippet that can help you with that:
Given:
client = OpensearchVectorClient(endpoint, idx, 1536,
embedding_field=embedding_field,
text_field=text_field)
vector_store = OpensearchVectorStore(client)
Get VectorStoreIndex
from the vector_store
:
vsi = VectorStoreIndex.from_vector_store(vector_store,
service_context=service_context)
query_engine = vsi.as_query_engine()
res = query_engine.query("your question")
print(res)
This should assist you in retrieving and querying your embedded data.
Upvotes: 0
Reputation: 1
you need to create open-search client and load index with VectorStoreIndex.from_vector_store() before you can run query on it,
index object is null which will not generate null result.
Upvotes: 0