Reputation: 1
I'm running into an error after implementing RecursiveJsonSplitter where it is saying that there is an IndexError: list index out of range. For context, I'm retrieving a JSON from MongoDB and don't really want to have any local files except the db that I have to initialize.
This is the JSON retrieval code:
cursor = collection.find({})
# return json.dumps(cursor, default=json_util.default)
json_docs = [json.dumps(doc, default=json_util.default) for doc in cursor]
return json_docs
This is the code I'm working on where I'm specifically failing at this line: json_chunks = text_splitter.split_json(json_data=documents)
# Get data from MongoDB
json_data = import_json_files()
# Output JSON data to a file
with open('data.txt', 'w') as f:
json.dump(json_data, f, ensure_ascii=False, indent=4, sort_keys=True, separators=(',', ': '))
with open('client_secrets.json') as f:
secrets = json.load(f)
os.environ["OPENAI_API_KEY"] = secrets['openai_api_key']
# Convert and ingest documents into vectorstore
json_data_str = json.dumps(json_data)
documents = json.loads(json_data_str)
if not documents:
print("No documents found.")
exit()
text_splitter = RecursiveJsonSplitter(max_chunk_size=1000)
json_chunks = text_splitter.split_json(json_data=documents)
embeddings = OpenAIEmbeddings()
db = FAISS.from_documents(json_chunks, embeddings)
retriever = db.as_retriever()
This is the error:
File "/Users/abcd/Desktop/RAG/simple_example.py", line 38, in <module>
json_chunks = text_splitter.split_json(json_data=documents)
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/langchain_text_splitters/json.py", line 89, in split_json
chunks = self._json_split(json_data)
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/langchain_text_splitters/json.py", line 76, in _json_split
self._set_nested_dict(chunks[-1], current_path, data)
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/langchain_text_splitters/json.py", line 32, in _set_nested_dict
d[path[-1]] = value
IndexError: list index out of range
I've looked at all the Langchain documentation and I don't really see a way that I can do this. Is this possible or am I banging my head against the wall?
Upvotes: 0
Views: 1721
Reputation: 1
The documents
variable is a List[Dict]
,whereas the RecursiveJsonSplitter.split_json()
accepts Dict[str,any]
. You can do either of the given below options:
convert_lists = True
while using split_json
method. This will result into multiple chunks with indices as the keys.create_documents
method that would result into splitted Langchain documents.Refer for more: RecursiveJsonSplitter Documentation
Although, look out for this issue. RecursiveJsonSplitter Retains State Across Invocations Due to Mutable Default Argument
Upvotes: 0