Reputation: 51
Im facing issue with creating embeddings and loading them to chormadb vector store.
My code:
´´´# test with one doc
from langchain.document_loaders import PyPDFLoader
# load data
loader = PyPDFLoader(".XXX.pdf")
# get list of documents
pages = loader.load()
# split
text_splitter = CharacterTextSplitter(
separator="\n",
chunk_size=450,
chunk_overlap=50,
length_function=len
)
#print(page)
pdf_splits = text_splitter.split_documents(pages) # list of documents
print(pdf_splits[:2])
print(len(pages), len(pdf_splits))
# create a list of texts
text_list = []
for doc in pdf_splits:
text_list.append(doc.page_content)
# embedding
#rm -rf ../chatbot_mvp/vectordb/PMS_research # removes inital store
embedding = OpenAIEmbeddings(openai_api_key=XXX)
embedding.embed_documents(text_list)
# vectior store # HELP ERROR
persist_directory = '../chatbot_mvp/vectordb/XXX/'
vectordb = Chroma.from_documents(
documents=pdf_splits,
embedding=embedding,
persist_directory=persist_directory)```
The error it gives is following:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
/Users/karinwiberg/Documents/chatbot_development/chatbot_mvp/chatbot_mvp.ipynb Cell 16 line 3
33 # vectior store # HELP ERROR
34 persist_directory = '../chatbot_mvp/vectordb/PMS_research/'
---> 35 vectordb = Chroma.from_documents(
36 documents=pdf_splits,
37 embedding=embedding,
38 persist_directory=persist_directory)
File ~/opt/anaconda3/envs/femai-alicia-prototype-01/lib/python3.9/site-packages/langchain/vectorstores/chroma.py:771, in Chroma.from_documents(cls, documents, embedding, ids, collection_name, persist_directory, client_settings, client, collection_metadata, **kwargs)
769 texts = [doc.page_content for doc in documents]
770 metadatas = [doc.metadata for doc in documents]
--> 771 return cls.from_texts(
772 texts=texts,
773 embedding=embedding,
774 metadatas=metadatas,
775 ids=ids,
776 collection_name=collection_name,
777 persist_directory=persist_directory,
778 client_settings=client_settings,
779 client=client,
780 collection_metadata=collection_metadata,
781 **kwargs,
782 )
...
--> 445 hnswlib_count = hnswlib.Index.file_handle_count
446 hnswlib_count = cast(int, hnswlib_count)
447 # One extra for the metadata file
AttributeError: type object 'hnswlib.Index' has no attribute 'file_handle_count'
I have seen other facing the same problem, however many talks about "downgrading" chromadb to 0.4.3 and I feel very confused. According to the chromadb docs that hasn't even been released yet.
My setup is following: Macbook M2. python 3.9.18 chromadb 0.4.18
Other unsuccessful try: Successfully installed chroma-hnswlib-0.7.1 chromadb-0.4.3 fastapi-0.99.1 pydantic-1.10.13
Thanks!
Upvotes: 1
Views: 553
Reputation: 982
I checked it and I don't see any data from Vector db. You can append this code and create a new db from your pdf data.
embeddings = OpenAIEmbeddings()
vectorstore = Chroma.from_documents(documents=documents, embedding=embeddings,persist_directory = "./db_vector")
Upvotes: 1