Reputation: 4035

TypeScript LangChain add field to document metadata

How should I add a field to the metadata of Langchain's Documents?

For example, using the CharacterTextSplitter gives a list of Documents:

const splitter = new CharacterTextSplitter({
  separator: " ",
  chunkSize: 7,
  chunkOverlap: 3,
});
splitter.createDocuments([text]);

A document will have the following structure:

{
  "pageContent": "blablabla",
  "metadata": {
    "name": "my-file.pdf",
    "type": "application/pdf",
    "size": 12012,
    "lastModified": 1688375715518,
    "loc": { "lines": { "from": 1, "to": 3 } }
  }
}

And I want to add a field to the metadata

Upvotes: 1

Answers (3)

Hussein Menshawi

Reputation: 96

You have to use the Document class, with the splitDocuments method.

Example:

const docOutput = await splitter.splitDocuments([
new Document({pageContent: text}, metadata: {someField: "someValue"})
])

Upvotes: 0

Chris Chiasson

Reputation: 816

It isn't currently shown how to do this in the recommended text splitter documentation, but the 2nd argument of createDocuments can take an array of objects whose properties will be assigned into the metadata of every element of the returned documents array.

myMetaData = { url: "https://www.google.com" }
const documents = await splitter.createDocuments([text], [myMetaData],
  { chunkHeader, appendChunkOverlapHeader: true });

After this, documents will contain an array, with each element being an object with pageContent and metaData properties. Under metaData, the properties from myMetaData above will also appear. pageContent will also have the text of chunkHeader prepended.

{
  pageContent: <chunkHeader plus the chunk>,
  metadata: <all properties of myMetaData plus loc (text line numbers of chunk)>
}

Upvotes: 1

Joost Döbken

Reputation: 4035

Ok... just loop over the docs I suppose:

for (var _doc of docs) {
  _doc.metadata['doc_id'] = doc_id;
}

Upvotes: 0

TypeScript LangChain add field to document metadata

Answers (3)

Related Questions