Reputation: 21
Im planning to develop an langchain that will take user input and provide them with url related to their request.
My data format is in json (its around 35 pages)
{ page_name:{data:"",url:""}, .. }
I tried using RAQ but it didn't work
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import Chroma
from langchain.chains import RetrievalQA
from langchain_google_genai import GoogleGenerativeAIEmbeddings
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=20)
embeddings = GoogleGenerativeAIEmbeddings(model="models/embedding-001", google_api_key = GOOGLE_API_KEY)
all_splits = text_splitter.split_documents(documents)
vectordb = Chroma.from_documents(documents=all_splits, embedding=embeddings, persist_directory="chroma_db")
retriever = vectordb.as_retriever()
qa = RetrievalQA.from_chain_type(
llm=llm,
chain_type="stuff",
retriever=retriever,
verbose=True
)
qa.run('about this website')
I tried also combining data and url but I didn't work correct
data_dict={}
for name, info in data.items():
print(f'Name: {name}, URL: {info["url"]}, Data: {info["data"]}')
data_dict[name] = f'URL: {info["url"]}, Data: {info["data"]}'
would appreciate if someone can guide me to the right path to develop this model/functionality
Upvotes: 1
Views: 1425
Reputation: 1
I would first use the LangChain JSON loader to load the JSON data, then you can split/chunk the document however fits your case. Whether or not to save the documents to a vector store is up to you, but what you did with using the vector store as retriever seems to be correct. If you want just a summary, however,I would bypass the vector store as retriever and instead use a summary chain such as the stuff chain:
# Define prompt
prompt_template = """Write a concise summary of the following:
"{text}"
CONCISE SUMMARY:"""
prompt = PromptTemplate.from_template(prompt_template)
# Define LLM chain
llm = ChatOpenAI(temperature=0, model_name="gpt-3.5-turbo-16k")
llm_chain = LLMChain(llm=llm, prompt=prompt)
# Define StuffDocumentsChain
stuff_chain = StuffDocumentsChain(llm_chain=llm_chain, document_variable_name="text")
docs = loader.load()
print(stuff_chain.run(docs))
Upvotes: 0