Updating Data in PGVector While Inserting New Data

Question

I'm currently using PostgreSQL's PGVector and need assistance with efficiently managing data insertion. My goal is to store data in the vector store while ensuring two conditions are met:

Skip inserting data that already exists.
Update data for existing IDs.

Here's a snippet of my code:

cursor.execute("SELECT EXISTS (SELECT FROM information_schema.tables WHERE table_name = 'langchain_pg_embedding')")
table_exists = cursor.fetchone()[0]

if not table_exists:
    print("Vectorstore does not exist in the database.")
    print("Creating Database ...")

    db = PGVector.from_documents(
        embedding=embeddings,
        documents=chunks,
        collection_name=COLLECTION_NAME,
        connection_string=CONNECTION_STRING
    )
    print("Database created successfully")
else:
    print("Vectorstore already exists in the database.")
    print("Checking data ...")

    # Check if the ID already exists in the database
    for chunk in chunks:
        cursor.execute("SELECT * FROM langchain_pg_embedding WHERE langchain_pg_embedding.cmetadata ->> 'id' = %s", (chunk.metadata["id"],))
        result = cursor.fetchall()

        if result:
            print(f"ID {chunk.metadata['id']} already exists in the database.")
            print(result)        

        else:
            print(f"Inserting ID {chunk.metadata['id']} into the database.")
            # Insert the chunk into the database
            db = PGVector.from_documents(
                embedding=embeddings,
                documents=[chunk],
                collection_name=COLLECTION_NAME,
                connection_string=CONNECTION_STRING
            )

Here's a snippet of my code:

cursor.execute("SELECT EXISTS (SELECT FROM information_schema.tables WHERE table_name = 'langchain_pg_embedding')")
table_exists = cursor.fetchone()[0]

if not table_exists:
    print("Vectorstore does not exist in the database.")
    print("Creating Database ...")

    db = PGVector.from_documents(
        embedding=embeddings,
        documents=chunks,
        collection_name=COLLECTION_NAME,
        connection_string=CONNECTION_STRING
    )
    print("Database created successfully")
else:
    print("Vectorstore already exists in the database.")
    print("Checking data ...")

    # Check if the ID already exists in the database
    for chunk in chunks:
        cursor.execute("SELECT * FROM langchain_pg_embedding WHERE langchain_pg_embedding.cmetadata ->> 'id' = %s", (chunk.metadata["id"],))
        result = cursor.fetchall()

        if result:
            print(f"ID {chunk.metadata['id']} already exists in the database.")
            print(result)        

        else:
            print(f"Inserting ID {chunk.metadata['id']} into the database.")
            # Insert the chunk into the database
            db = PGVector.from_documents(
                embedding=embeddings,
                documents=[chunk],
                collection_name=COLLECTION_NAME,
                connection_string=CONNECTION_STRING
            )

Updating Data in PGVector While Inserting New Data

Answers (1)

Related Questions