Reputation: 14228
I have a domain specific JSON object for which I want to store in a vector db. I would be using embedding models like sentence=transformer or openai's text-embedding-002.
Questions are: a) Can these models efficiently compute proper embeddings for these json objects? b) Even if they can compute embeddings how efficiently can an LLM reason through them later? LLMs can reason through text but would they from JSONs?
The domain specific data is not so much esoteric - mostly the keys and values are English.
Upvotes: 2
Views: 3250
Reputation: 2589
This approach works for me. I'm using mistral 7b. I like to think it would be much better using 8x7b but I don't have the RAM for that on my M1 MacBook Pro.
export const createVectorStore = async () => {
const textSplitter = new CharacterTextSplitter({ chunkSize: 500, chunkOverlap: 100 });
// Load the documents
const loader = new JSONLoader('./tree.json', ["/Individuals", "/Relations", "/Source"]);
const docs = await loader.load();
// Split the documents into text chunks
const docSplit = await textSplitter.splitDocuments(docs);
// Convert the text chunks into embeddings
const embeddingsFunction = new OllamaEmbeddings({ model: 'nomic-embed-text' });
// Store the embeddings in a vector store
const vectorStore = await FaissStore.fromDocuments(docSplit, embeddingsFunction, {});
return vectorStore;
};
const afterRagTemplate = `
answer the question based only on the following text.
History: {chat_history}
Family tree database: {context}
Hints: The family tree database comes from a gedcom file
Question: {input}
`;
Upvotes: 0
Reputation: 43
Large LANGUAGE models work on statistics on language. If you want to reason with your json, you should store it as text in your vectordb. Example { „Customername“: „john does“ }
Store it as „The name of the customer is john doe.“
So question a) define proper. You can create embeddings from json, but its not likely that you can query them later. b) likely not possible
Upvotes: -1