wolfeweeks
wolfeweeks

Reputation: 445

Trouble deleting ChromaDB documents

I can't seem to delete documents from my Chroma vector database. I would appreciate any insight as to why this example does not work, and what modifications can/should be made to get it functioning correctly.

import dotenv
import os
import chromadb
from chromadb.config import Settings
from chromadb.utils import embedding_functions

dotenv.load_dotenv()

client = chromadb.Client(
    Settings(chroma_db_impl="duckdb+parquet", persist_directory="db/chroma")
)

embedding = embedding_functions.OpenAIEmbeddingFunction(
    api_key=os.getenv("OPENAI_API_KEY"),
    model_name="text-embedding-ada-002",
)

collection = client.get_or_create_collection(name="test", embedding_function=embedding)

from llama_index import SimpleDirectoryReader

documents = SimpleDirectoryReader(
    input_dir="./sampledir",
    recursive=True,
    exclude_hidden=False,
    filename_as_id=True,
).load_data()

collection.add(
    documents=[doc.get_text() for doc in documents],
    ids=[doc.doc_id for doc in documents],
)

print(collection.count())  # PRINTS n

doc_ids = collection.get()["ids"]
collection.delete(ids=doc_ids)

print(collection.count())  # SHOULD BE ZERO, BUT PRINTS n

Upvotes: 3

Views: 6747

Answers (2)

konilse
konilse

Reputation: 74

I was having the same problem : I deleted some documents and I was trying to query my collection and I was having "None" values and this helped :

from chromadb.api.client import SharedSystemClient

client._system.stop()
SharedSystemClient._identifer_to_system.pop(client._identifier, None)

Adding this after the delete documents solved my problem 🙂

Upvotes: 0

vaibhav singh
vaibhav singh

Reputation: 173

This might help to anyone searching to delete a doc in ChromaDB

Delete by ID

Get the collection, you can follow any of the steps mentioned in the documentation like this:

collection = client.get_collection(name="collection_name")
collection.delete(ids="id_value")

Delete by filtering metadata

collection = client.get_collection(name="collection_emb")
collection.delete(where={'metadata_field': {'<Operator>': '<Value>'}})

If you have metadata with the key URL and you want to delete all doc with URL equal to www.example.com

collection.delete(where={'URL': {'$eq': 'www.example.com'}})

Filtering metadata supports the following operators:

$eq - equal to (string, int, float)
$ne - not equal to (string, int, float)
$gt - greater than (int, float)
$gte - greater than or equal to (int, float)
$lt - less than (int, float)
$lte - less than or equal to (int, float)

For more information on how to filter using metadata, look here in documentation:

Upvotes: 2

Related Questions