Llama Index: Property Graph Index Failing to Receive Nodes and Creating Graph

Question

I'm currently working with Llama Index on an API to create a property graph for a project. However, I'm encountering issues, particularly with the graph creation process. Below is my code:

@dataclass
class GraphGenService(AsyncService):
    file: UploadFile
    graph_store: Neo4jPropertyGraphStore = None

    async def SetUp_GraphStore(self) -> None:
        self.graph_store = Neo4jPropertyGraphStore(
            username="neo4j",
            ....
        )

    async def setup_models(self) -> None:
        self.llm = Gemini(
            model_name="models/gemini-1.5-flash-latest",
            ....
            temperature=0.0,
            max_tokens=2500,
        )
        self.embed_model = OpenAIEmbedding(
            model_name="text-embedding-3-large",
            ...
        )

    async def extract_file_contents(self):
        FileExtract: ExtractedFile = (
            await FileContentExtractionEngine().extract_contents(
                file=self.file, cc=True
            )
        )
        # logger.debug("%s", FileExtract)
        return await FileExtract.to_llamaindex_docs()

    def parse_documents(self, documents: List[str]) -> List[dict[str, str]]:
        parser = SentenceSplitter(chunk_size=700)
        return parser.get_nodes_from_documents(documents=documents, show_progress=True)

    async def build_index(self, parsed_nodes: List[str]) -> PropertyGraphIndex:
        entities = Literal["PERSONAGEM", "EVENTO", "TITULO", "STATUS", "GRUPO"]
        relations = Literal[
            "CASA_COM",
            "TEM_FILHO",
            "DIVORCIA_DE",
            "TORNA_SE",
            "ENCONTRA",
            "FUNDE_COM",
            "ESCOLHE",
            "RECRUTA",
            "COMBATE",
        ]
        Validation_Schema = [
            ("PERSONAGEM", "CASA_COM", "PERSONAGEM"),
            ("PERSONAGEM", "TEM_FILHO", "PERSONAGEM"),
            ("PERSONAGEM", "DIVORCIA_DE", "PERSONAGEM"),
            ("PERSONAGEM", "TORNA_SE", "STATUS"),
            ("PERSONAGEM", "ENCONTRA", "PERSONAGEM"),
            ("PERSONAGEM", "FUNDE_COM", "PERSONAGEM"),
            ("PERSONAGEM", "ESCOLHE", "PERSONAGEM"),
            ("PERSONAGEM", "RECRUTA", "PERSONAGEM"),
            ("PERSONAGEM", "COMBATE", "GRUPO"),
            ("GRUPO", "ESCOLHE", "PERSONAGEM"),
        ]
        index = PropertyGraphIndex(
            nodes=parsed_nodes,
            llm=self.llm,
            use_async=True,
            kg_extractors=[
                SchemaLLMPathExtractor(
                    llm=self.llm,
                    extract_prompt=""" 
                    
                Analise a linha do tempo dos eventos e extraia as relações principais entre     entidades. Forneça a resposta no seguinte formato para cada relação:

                (Entidade1, RELAÇÃO, Entidade2)

                Instruções:

                Liste cada relação em uma linha separada.
                Inclua apenas as relações principais descritas explicitamente no documento.
                Use as seguintes entidades: PERSONAGEM, EVENTO, TITULO, STATUS, GRUPO.
                Use as seguintes relações: CASA_COM, TEM_FILHO, DIVORCIA_DE, TORNA_SE, ENCONTRA, FUNDE_COM, ESCOLHE, RECRUTA, COMBATE.
                Para personagens, use seus nomes completos em maiúsculas (ex: MARIKA, GODFREY, RADAGON).
                Para outras entidades, use descrições concisas em maiúsculas (ex: PRIMEIRO_LORDE_ELDEN, BESTA_DE_ELDEN).
                Mantenha a ordem cronológica dos eventos conforme apresentada no documento.
                Não inclua informações adicionais ou explicações além do solicitado.
                A precisão na extração e formatação das relações é crucial. Forneça apenas as informações solicitadas, sem elaborações adicionais.

                Exemplos do formato esperado:
                (MARIKA, CASA_COM, GODFREY)
                (GODFREY, TORNA_SE, PRIMEIRO_LORDE_ELDEN)
                (MARIKA, TEM_FILHO, GODWYN)
                (RADAGON, ENCONTRA, RENNALA)

       
            """,
                    possible_entities=entities,
                    possible_relations=relations,
                    kg_validation_schema=Validation_Schema,
                    num_workers=20,
                    strict=False,
                )
            ],
            property_graph_store=self.graph_store,
            embed_model=self.embed_model,
            show_progress=True,
        )
        index_nodes = await index._insert_nodes(parsed_nodes)
        index.build_index_from_nodes(index_nodes)

        return index


    async def retrieve_nodes(self, retriever) -> List[NodeWithScore]:
        query = """
        
           Identifique todos os filhos de Marika mencionados no texto.
           
            """
        try:
            nodes = await retriever.aretrieve(query)
            if not nodes:
                logger.debug("No nodes retrieved. Trying a more general query.")
                nodes = await retriever.aretrieve(query)
            return nodes
        except Exception as e:
            logger.error("Error during retrieval: %s", str(e))
            return []

    async def execute(self) -> List[Tuple[str, str, str]]:
        await self.setup_models()
        await self.SetUp_GraphStore()
        documents = await self.extract_file_contents()
        parsed_nodes = self.parse_documents(documents)
        index = await self.build_index(parsed_nodes)
        retriever = index.as_retriever(include_text=False)
        nodes = await self.retrieve_nodes(retriever)
        Triples = [json.dumps(node.text) for node in nodes]

        return Triples

Issue:

When trying to create the index, it returns "no nodes retrieved..." despite the nodes being correctly extracted from the document and existing in the process. I'm using the private method _insert_nodes because it's the only way it seems to work. However, even though the graph is created (and visible in Neo4j), the API process doesn't complete—it doesn't return a 200 status or anything.

I'm not using .from_documents() because I'm parsing the documents via APIdog.

If I attempt to use the "index" without the private method, it doesn't work and returns "no nodes retrieved..." despite the nodes existing (I’ve confirmed this with logging).

The strangest part is that sometimes it works, and sometimes it doesn't.

I've tried:

Using it without the private method.
Refactoring the code.
Clearing Neo4j's cache.
Resetting Neo4j's database.
Trying different document extraction methods.
Using .from_documents.
Modifying prompts and queries.

But nothing seems to work.

Llama Index: Property Graph Index Failing to Receive Nodes and Creating Graph

Answers (0)

Related Questions