John mick
John mick

Reputation: 21

Langchain LLM model that query and provide response based on json

Im planning to develop an langchain that will take user input and provide them with url related to their request.

My data format is in json (its around 35 pages)

{ page_name:{data:"",url:""}, .. }

I tried using RAQ but it didn't work

from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import Chroma
from langchain.chains import RetrievalQA
from langchain_google_genai import GoogleGenerativeAIEmbeddings

text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=20)
embeddings = GoogleGenerativeAIEmbeddings(model="models/embedding-001", google_api_key = GOOGLE_API_KEY)

all_splits = text_splitter.split_documents(documents)
vectordb = Chroma.from_documents(documents=all_splits, embedding=embeddings, persist_directory="chroma_db")
retriever = vectordb.as_retriever()

qa = RetrievalQA.from_chain_type(
    llm=llm, 
    chain_type="stuff", 
    retriever=retriever, 
    verbose=True
)

qa.run('about this website')

I tried also combining data and url but I didn't work correct

data_dict={}
for name, info in data.items():
    print(f'Name: {name}, URL: {info["url"]}, Data: {info["data"]}')
    data_dict[name] = f'URL: {info["url"]}, Data: {info["data"]}'

would appreciate if someone can guide me to the right path to develop this model/functionality

Upvotes: 1

Views: 1425

Answers (1)

Yueqi Peng
Yueqi Peng

Reputation: 1

I would first use the LangChain JSON loader to load the JSON data, then you can split/chunk the document however fits your case. Whether or not to save the documents to a vector store is up to you, but what you did with using the vector store as retriever seems to be correct. If you want just a summary, however,I would bypass the vector store as retriever and instead use a summary chain such as the stuff chain:

# Define prompt
prompt_template = """Write a concise summary of the following:
"{text}"
CONCISE SUMMARY:"""
prompt = PromptTemplate.from_template(prompt_template)

# Define LLM chain
llm = ChatOpenAI(temperature=0, model_name="gpt-3.5-turbo-16k")
llm_chain = LLMChain(llm=llm, prompt=prompt)

# Define StuffDocumentsChain
stuff_chain = StuffDocumentsChain(llm_chain=llm_chain, document_variable_name="text")

docs = loader.load()
print(stuff_chain.run(docs))

Upvotes: 0

Related Questions