niuniu
niuniu

Reputation: 1

When I use neo4j to query data, the speed will slowly slow down

I imported my data with joern, but when I wanted to produce pdg diagrams, the query speed became slower and slower with the for loop.The code used for query is as follows. I used cprofile to analyze and found that there was a problem with this function.

def getUSENodesVar(db, func_id):
    query = "g.v(%s).out('USE').code" % func_id
    ret = db.runGremlinQuery(query)
    if ret == []:
        return False
    else:
        return ret

I hope to improve the speed of inquiry.

Upvotes: 0

Views: 49

Answers (1)

cybersam
cybersam

Reputation: 67019

It is inefficient to make a separate Gremlin query per func_id for multiple func_ids.

Instead, your function should take a list of func_ids and make a single Gremlin query that returns a collection of distinct code values. For example:

def getUSENodesVar(db, func_ids):
    func_ids_str = str(func_ids).replace('[', '').replace(']', '')
    query = "g.V().hasId(within(func_ids_str)).out('USE').values('code').dedup()"
    return db.runGremlinQuery(query)

Since joern's runGremlinQuery only takes a query argument (and does not also take a parameters argument), this function converts the input list (func_ids) into a string (runGremlinQuery) that the Gremlin API will understand to be a list when there are multiple ids.

Of course, the client of getUSENodesVar will also have to be changed to pass it a list of ids, and to handle the returned list of codes.

Upvotes: 0

Related Questions