SomeDude
SomeDude

Reputation: 14228

How do I embed json documents using embedding models like sentence-transformer or open ai's embedding model?

I have a domain specific JSON object for which I want to store in a vector db. I would be using embedding models like sentence=transformer or openai's text-embedding-002.

Questions are: a) Can these models efficiently compute proper embeddings for these json objects? b) Even if they can compute embeddings how efficiently can an LLM reason through them later? LLMs can reason through text but would they from JSONs?

The domain specific data is not so much esoteric - mostly the keys and values are English.

Upvotes: 2

Views: 3250

Answers (2)

Martin Lockett
Martin Lockett

Reputation: 2589

This approach works for me. I'm using mistral 7b. I like to think it would be much better using 8x7b but I don't have the RAM for that on my M1 MacBook Pro.

export const createVectorStore = async () => {
    const textSplitter = new CharacterTextSplitter({ chunkSize: 500, chunkOverlap: 100 });
    // Load the documents
    const loader = new JSONLoader('./tree.json', ["/Individuals", "/Relations", "/Source"]);
    const docs = await loader.load();
    // Split the documents into text chunks
    const docSplit = await textSplitter.splitDocuments(docs);
    // Convert the text chunks into embeddings
    const embeddingsFunction = new OllamaEmbeddings({ model: 'nomic-embed-text' });
    // Store the embeddings in a vector store
    const vectorStore = await FaissStore.fromDocuments(docSplit, embeddingsFunction, {});
    return vectorStore;
};

const afterRagTemplate = `
    answer the question based only on the following text.
    History: {chat_history}
    Family tree database: {context}
    Hints: The family tree database comes from a gedcom file
    Question: {input}
`;

Upvotes: 0

Gernot Glawe
Gernot Glawe

Reputation: 43

Large LANGUAGE models work on statistics on language. If you want to reason with your json, you should store it as text in your vectordb. Example { „Customername“: „john does“ }

Store it as „The name of the customer is john doe.“

So question a) define proper. You can create embeddings from json, but its not likely that you can query them later. b) likely not possible

Upvotes: -1

Related Questions