Yeti123
Yeti123

Reputation: 49

How to delete documents in LangChain vectorstore

I am following LangChain's tutorial to create an example selector to automatically select similar examples given an input.

example_selector = SemanticSimilarityExampleSelector.from_examples(
    # This is the list of examples available to select from.
    examples, 
    # This is the embedding class used to produce embeddings which are used to measure semantic similarity.
    OpenAIEmbeddings(), 
    # This is the VectorStore class that is used to store the embeddings and do a similarity search over.
    Chroma, 
    # This is the number of examples to produce.
    k=1
)

I passed my documents in examples, however I realized some examples would trigger an OpenAI content filtering error thus I want to remove them from the vectorstore, I couldn't figure out how to do it. I tried to recreate my example documents and example selector all over again but would love to learn if there's way to remove embeddings from the vectorstore.

Upvotes: 4

Views: 15758

Answers (2)

PhoneRoutine
PhoneRoutine

Reputation: 31

If you have to delete all the documents in the vector, you can also do this

vectorstore._client.delete_collection(vectorstore._collection.name)

Upvotes: 0

carteakey
carteakey

Reputation: 364

Since you appear to be using ChromaDB, you can use the delete method provided. Most of the databases should have a delete method in langchain.

Below is an example from langchain's official docs (https://python.langchain.com/docs/integrations/vectorstores/chroma#update-and-delete)

# create simple ids
ids = [str(i) for i in range(1, len(docs) + 1)]

# add data
example_db = Chroma.from_documents(docs, embedding_function, ids=ids)
docs = example_db.similarity_search(query)
print(docs[0].metadata)

# delete the last document
print("count before", example_db._collection.count())
example_db._collection.delete(ids=[ids[-1]])
print("count after", example_db._collection.count())

Upvotes: 5

Related Questions