Reputation: 1

LLM unable to reliably retrieve nodes or information from a knowledge graph using LangChain

I have a very simple knowledge graph (a toy version compared to the actual one I am working with, but I am experiencing similar issues with both) made from a table. The graph was created by using NetworkxEntityGraph() and a series of for loops, as such:

import pandas as pd
import numpy as np

import os
import time
from langchain.chains import GraphQAChain
from langchain_community.graphs.networkx_graph import NetworkxEntityGraph

import networkx as nx
from langchain.llms import OpenAI

from langchain.chains import RetrievalQA

# Does this package still work?
from langchain.retrievers import GraphRetriever

# Manage the API keys
from dotenv import load_dotenv

load_dotenv(dotenv_path='./API_KEYs.env')

# Google
from langchain_google_genai import GoogleGenerativeAI
import google.generativeai as genai

GOOGLE_API_KEY = os.getenv('GOOGLE_API_KEY')
genai.configure(api_key=GOOGLE_API_KEY)
llm = GoogleGenerativeAI(model='gemini-pro', google_api_key=GOOGLE_API_KEY)

fake_data = {'entity_a': ['Joanie', 'Johny', 'I'],
            'entity_b': ['Chachi', 'Here', 'Jenn-ay'],
            'interaction': ['loves', 'Where_he_is', 'still_loves']}

fake_data = pd.DataFrame(fake_data)

G = NetworkxEntityGraph()

#### ---- Add nodes ---- ####
for id, row in fake_data.iterrows():
    G._graph.add_node(row['entity_a'])
    G._graph.add_node(row['entity_b'])

#### ---- Add edges ---- ####
for id, row in fake_data.iterrows():
    G._graph.add_edge(
        row['entity_a'],
        row['entity_b'],
        relation=row['interaction']
    )

My goal is to ask questions to the LLM about the nodes and relations between nodes in the graph. As of now, The LLM does not seem to understand my graph, or does not seem able to extract nodes from the graph. I may be missing proper retrival functions, but the same functions seem to be working for other people who have used them and posted tutorials online even in the past couple of weeks. EG this YouTube video was released a couple of weeks ago and uses very similar code, and seems to work well for him: GraphRAG using CSV file and LangChain

I am able to query the graph directly and see that the nodes and edges are input into the graph correctly:

neighbors = list(G._graph.neighbors("Joanie"))
related_edges = list(G._graph.edges("Joanie", data=True))
print("Neighbors:", neighbors)
print("Related Edges:", related_edges)

Neighbors: ['Chachi']
Related Edges: [('Joanie', 'Chachi', {'relation': 'loves'})]

However, when I put the LLM to work, it seems unable to understand my graph, or at least is not able to extract nodes from the graph:

chain = GraphQAChain.from_llm(
    llm=llm,
    graph=G,
    verbose=True
)

question = "Tell me about Joanie's relation to the other people in the graph."

chain.invoke(question)

> Entering new GraphQAChain chain...
Entities Extracted:
NONE
Full Context:


> Finished chain.
{'query': "Tell me about Joanie's relation to the other people in the graph.",
 'result': 'This context does not mention anything about Joanie or her relation to the other people in the graph, so I cannot answer this question from the provided context.}

The same is true, even if I use a much more simple question about the graph:

question = "Is Joanie in this graph?"

chain.invoke(question)

> Entering new GraphQAChain chain...
Entities Extracted:
NONE
Full Context:


> Finished chain.
{'query': 'Is Joanie in this graph?', 'result': "I don't know."}

Another thing that I have tried is to use the from langchain.retrievers import GraphRetriever function directly, as descriped in this tutorial: Building a Graph RAG System from Scratch with LangChain: A Comprehensive Tutorial However, it seems like LangChain may have recently updated their package and the GraphRetriever function no longer exists in their package from what I can find. I am not sure how they updated their functionality.

Is there anyone who is familiar with the lastest updates to LangChain that can help me understand what I am missing here?

Thank you very much!

UPDATE: I am still researching for ways to answer this question, but as of a few days later, I have still not figured out a working solution. Any guidance, including reading materials, would be greatly appreciated!!

UPDATE 2: I got this response today, which is almost progress. But it still tells me that the LLM is having problems working with the networkx knowledge graph:

question = "Tell me about Joanie's relation to the other people in the graph."

chain.invoke(question)

> Entering new GraphQAChain chain...
Entities Extracted:
Joanie
Full Context:
Joanie loves Chachi

> Finished chain.
{'query': "Tell me about Joanie's relation to the other people in the graph.",
 'result': "I don't know."}

Upvotes: -1

Answers (2)

Pascal Louis-Marie

Reputation: 252

I was able to have it running using open source model. Not all models behave the same! This is the code I used below, and as you have experienced many libraries have been deprecated in the meantime, so we have to use new libraries.

import networkx as nx
from langchain_community.graphs.index_creator import GraphIndexCreator
from langchain_openai import ChatOpenAI

api_key='<_groq_api_here> #https://console.groq.com/keys'

llm = ChatOpenAI(
    openai_api_base="https://api.groq.com/openai/v1", 
    openai_api_key=api_key,
    model_name="llama3-70b-8192", #https://console.groq.com/docs/models
)

index_creator = GraphIndexCreator(llm=llm)

#Reading nodes and edges from text
docs1 = [
'Joanie loves Chachi',
'Johny is here',
'I still love Jennay'
]

text1 = '\n'.join(docs1)

# Create a knowledge graph
graph = nx.Graph()
graph = index_creator.from_text(text1)

# created graph inspection
print('Triples output:',graph.get_triples())

# Graph can be saved and loaded later on
"""
print('saving the graph ->','graph1.gml')
graph.write_to_gml("graph1.gml")
"""

# if it was saved, you do not have to build the graph each single time, you can load a graph previously built 
"""
from langchain_community.graphs import NetworkxEntityGraph
graph = NetworkxEntityGraph.from_gml("graph1.gml")
print(graph.get_triples())
print(graph.get_number_of_nodes())
"""

#Retrieval
from langchain.chains import GraphQAChain
chain = GraphQAChain.from_llm(llm, graph=graph, verbose=True)

query1='Is Joanie part of the input data?'
query2="Tell me about Joanie's relation to the other people?"
print(chain.invoke(query1))
print(chain.invoke(query2))

#Optional drawing - using pyvis
"""
from pyvis.network import Network

# Assuming `graph` is the Networkx graph you created with GraphIndexCreator
G = graph._graph

# Convert to PyVis graph
net = Network(notebook=True)
net.from_nx(G)

# Display the graph
net.show("graph1.html")
"""

and here the obtained output :

Triples output: [('Joanie', 'Chachi', 'loves'), ('Johny', 'here', 'is'), ('I', 'Jennay', 'love')]


> Entering new GraphQAChain chain... Entities Extracted: Joanie Full Context: Joanie loves Chachi

> Finished chain. {'query': 'Is Joanie part of the input data?', 'result': 'Yes, Joanie is part of the input data.'}


> Entering new GraphQAChain chain... Entities Extracted: Joanie Full Context: Joanie loves Chachi

> Finished chain. {'query': "Tell me about Joanie's relation to the other people?", 'result': 'Based on the triplet "Joanie loves Chachi", I can infer that Joanie has a romantic relationship with Chachi. That\'s all I can say about Joanie\'s relation to other people based on this single triplet.'}

Upvotes: 0

Topher

Reputation: 1

Well, so far the only thing that has worked for me is to switch from Gemini to OpenAI. This is a little discouraging because I would like to use open-source models in the future and I don't like seeing that the Langchain setup is more successful for some models than others. Also, I am confused as to why the code seemed to work for the gentleman in the YouTube video but was almost completely unresponsive to me. I am also very confused as to why the from langchain.retrievers import GraphRetriever caused errors, or rather why the GraphRetriever function is no longer in LangChain. It seems like this worked for others very recently, and I am not sure why they abandoned it. Perhaps the code as I have written it is meant to replace this functionality? If anyone has any ideas as to why this is or has other answers, please respond.

The only changes that I made in my code are as follows:

import pandas as pd
import numpy as np

import os
import time

# Manage the API keys
from dotenv import load_dotenv

load_dotenv(dotenv_path='./API_KEYs.env')

# ChatGPT
ORG_ID = os.getenv('ORG_ID')
API_KEY = os.getenv('API_KEY')

llm = OpenAI(openai_api_key = API_KEY, openai_organization = ORG_ID, temperature = 0)

# Google
#from langchain_google_genai import GoogleGenerativeAI
#import google.generativeai as genai

#GOOGLE_API_KEY = os.getenv('GOOGLE_API_KEY')
#genai.configure(api_key=GOOGLE_API_KEY)
#llm = GoogleGenerativeAI(model='gemini-pro', google_api_key=GOOGLE_API_KEY)


import networkx as nx
from langchain.chains import GraphQAChain
from langchain_community.graphs.networkx_graph import NetworkxEntityGraph
from langchain.llms import OpenAI
#from langchain.chains import RetrievalQA
#from langchain.retrievers import GraphRetriever

and with doing this, I not only get a response from the LLM but I also get much more detailed and accurate responses about my knowledge graph.

Upvotes: -1

LLM unable to reliably retrieve nodes or information from a knowledge graph using LangChain

Answers (2)

Related Questions