Reputation: 445
I can't seem to delete documents from my Chroma vector database. I would appreciate any insight as to why this example does not work, and what modifications can/should be made to get it functioning correctly.
import dotenv
import os
import chromadb
from chromadb.config import Settings
from chromadb.utils import embedding_functions
dotenv.load_dotenv()
client = chromadb.Client(
Settings(chroma_db_impl="duckdb+parquet", persist_directory="db/chroma")
)
embedding = embedding_functions.OpenAIEmbeddingFunction(
api_key=os.getenv("OPENAI_API_KEY"),
model_name="text-embedding-ada-002",
)
collection = client.get_or_create_collection(name="test", embedding_function=embedding)
from llama_index import SimpleDirectoryReader
documents = SimpleDirectoryReader(
input_dir="./sampledir",
recursive=True,
exclude_hidden=False,
filename_as_id=True,
).load_data()
collection.add(
documents=[doc.get_text() for doc in documents],
ids=[doc.doc_id for doc in documents],
)
print(collection.count()) # PRINTS n
doc_ids = collection.get()["ids"]
collection.delete(ids=doc_ids)
print(collection.count()) # SHOULD BE ZERO, BUT PRINTS n
Upvotes: 3
Views: 6747
Reputation: 74
I was having the same problem : I deleted some documents and I was trying to query my collection and I was having "None" values and this helped :
from chromadb.api.client import SharedSystemClient
client._system.stop()
SharedSystemClient._identifer_to_system.pop(client._identifier, None)
Adding this after the delete documents solved my problem 🙂
Upvotes: 0
Reputation: 173
This might help to anyone searching to delete a doc in ChromaDB
Get the collection, you can follow any of the steps mentioned in the documentation like this:
collection = client.get_collection(name="collection_name")
collection.delete(ids="id_value")
collection = client.get_collection(name="collection_emb")
collection.delete(where={'metadata_field': {'<Operator>': '<Value>'}})
If you have metadata with the key URL
and you want to delete all doc with URL equal to www.example.com
collection.delete(where={'URL': {'$eq': 'www.example.com'}})
Filtering metadata supports the following operators:
$eq
- equal to (string, int, float)
$ne
- not equal to (string, int, float)
$gt
- greater than (int, float)
$gte
- greater than or equal to (int, float)
$lt
- less than (int, float)
$lte
- less than or equal to (int, float)
For more information on how to filter using metadata, look here in documentation:
Upvotes: 2