karwi
karwi

Reputation: 51

chromadb from_documents AttributeError: type object 'hnswlib.Index' has no attribute 'file_handle_count'

Im facing issue with creating embeddings and loading them to chormadb vector store.

My code:

´´´# test with one doc
from langchain.document_loaders import PyPDFLoader

# load data
loader = PyPDFLoader(".XXX.pdf")

# get list of documents
pages = loader.load() 

# split
text_splitter = CharacterTextSplitter(
    separator="\n",
    chunk_size=450,
    chunk_overlap=50,
    length_function=len 
    )

#print(page)
pdf_splits = text_splitter.split_documents(pages) # list of documents
print(pdf_splits[:2])
print(len(pages), len(pdf_splits))

# create a list of texts
text_list = []
for doc in pdf_splits:
    text_list.append(doc.page_content)

# embedding
#rm -rf ../chatbot_mvp/vectordb/PMS_research # removes inital store 
embedding = OpenAIEmbeddings(openai_api_key=XXX)
embedding.embed_documents(text_list) 

# vectior store # HELP ERROR
persist_directory = '../chatbot_mvp/vectordb/XXX/'
vectordb = Chroma.from_documents(
    documents=pdf_splits,
    embedding=embedding,
    persist_directory=persist_directory)```

The error it gives is following:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
/Users/karinwiberg/Documents/chatbot_development/chatbot_mvp/chatbot_mvp.ipynb Cell 16 line 3
     33 # vectior store # HELP ERROR
     34 persist_directory = '../chatbot_mvp/vectordb/PMS_research/'
---> 35 vectordb = Chroma.from_documents(
     36     documents=pdf_splits,
     37     embedding=embedding,
     38     persist_directory=persist_directory)

File ~/opt/anaconda3/envs/femai-alicia-prototype-01/lib/python3.9/site-packages/langchain/vectorstores/chroma.py:771, in Chroma.from_documents(cls, documents, embedding, ids, collection_name, persist_directory, client_settings, client, collection_metadata, **kwargs)
    769 texts = [doc.page_content for doc in documents]
    770 metadatas = [doc.metadata for doc in documents]
--> 771 return cls.from_texts(
    772     texts=texts,
    773     embedding=embedding,
    774     metadatas=metadatas,
    775     ids=ids,
    776     collection_name=collection_name,
    777     persist_directory=persist_directory,
    778     client_settings=client_settings,
    779     client=client,
    780     collection_metadata=collection_metadata,
    781     **kwargs,
    782 )
...
--> 445     hnswlib_count = hnswlib.Index.file_handle_count
    446     hnswlib_count = cast(int, hnswlib_count)
    447     # One extra for the metadata file

AttributeError: type object 'hnswlib.Index' has no attribute 'file_handle_count'

I have seen other facing the same problem, however many talks about "downgrading" chromadb to 0.4.3 and I feel very confused. According to the chromadb docs that hasn't even been released yet.

My setup is following: Macbook M2. python 3.9.18 chromadb 0.4.18

Other unsuccessful try: Successfully installed chroma-hnswlib-0.7.1 chromadb-0.4.3 fastapi-0.99.1 pydantic-1.10.13

Thanks!

Upvotes: 1

Views: 553

Answers (1)

Tran Minh Quan
Tran Minh Quan

Reputation: 982

I checked it and I don't see any data from Vector db. You can append this code and create a new db from your pdf data.

embeddings = OpenAIEmbeddings()
vectorstore = Chroma.from_documents(documents=documents, embedding=embeddings,persist_directory = "./db_vector")

Upvotes: 1

Related Questions