Encomium
Encomium

Reputation: 277

LangChain embedding PDF error: AttributeError: 'list' object has no attribute 'page_content'

Drawing blanks as to why this is happening when using PDF pages as embedding for LangChain/OpenAI. Any help is appreciated.

Code:

from langchain.document_loaders import PyPDFLoader
from langchain.vectorstores import FAISS
from langchain.embeddings.openai import OpenAIEmbeddings

loader = PyPDFLoader("FILENAME.pdf")
pages = loader.load_and_split()

docs = loader.load()
texts = [doc.page_content for doc in docs]

# Create embeddings
embedder = OpenAIEmbeddings()  
embeddings = embedder.embed_documents(texts)

# Store in vectorstore
store = FAISS.from_documents(pages, OpenAIEmbeddings())
store.add_documents(embeddings)

Error:

AttributeError: 'list' object has no attribute 'page_content'

Upvotes: 0

Views: 1033

Answers (1)

ZKS
ZKS

Reputation: 2816

You need to make slight change in your code

        from langchain.document_loaders import PyPDFLoader
        from langchain.vectorstores import FAISS
        from langchain.embeddings.openai import OpenAIEmbeddings

        loader = PyPDFLoader("FILENAME.pdf")
        docs = loader.load_and_split()

        texts = [doc.page_content for doc in docs]

        # Create embeddings
        embedder = OpenAIEmbeddings()  
        embeddings = embedder.embed_documents(texts)

        # Store in vectorstore
        store = FAISS.from_documents(pages, OpenAIEmbeddings())
        store.add_documents(embeddings)

Upvotes: 0

Related Questions