Reputation: 3390

neo4j how to return all node labels with Cypher?

I can't find how to return a node labels with Cypher.

Anybody knows the syntax for this operation?

Upvotes: 82

Answers (9)

Kaan

Reputation: 5794

How do you get all labels for a specific node?

Use the labels() function, as in this example which matches nodes with a name property that have the value 'Alice':

MATCH (a) WHERE a.name = 'Alice'
RETURN labels(a)

The return type for labels() is LIST<STRING>, so it can return one or more values.

More info here: https://neo4j.com/docs/cypher-manual/5/functions/list/#functions-labels

How do you get all labels in the graph?

There are multiple upvoted answers on this question, only one of which you should use (listed below as "Solution #1"). I've posted three ways of getting all in-use labels in the graph. The test data set has 109,120 nodes in the graph.

MATCH (x) RETURN count(x)
109120

Solution #1: Use built-in procedure `db.labels()`

The usage looks like this:

CALL db.labels();

On my test data set, this query completed in ~1 ms (successive runs shown):

Started streaming 9 records in less than 1 ms and completed in less than 1 ms.
Started streaming 9 records after 1 ms and completed after 1 ms.
Started streaming 9 records in less than 1 ms and completed after 1 ms.

Here's the execution plan output:

EXPLAIN CALL db.labels();

ProcedureCall
    label
    db.labels() :: (label :: STRING)
    10 estimated rows

ProduceResults
    label
    label
    10 estimated rows

Result

Note: estimated rows is 10, with no mention of the ~109,000 nodes in the graph.

Solution #2: Match all nodes, call `labels()` on each node, get distinct results

The query looks like this:

MATCH (n) RETURN DISTINCT labels(n)

Here are several runs of that query, each more than an order of magnitude slower than solution #1:

Started streaming 9 records after 1 ms and completed after 41 ms.
Started streaming 9 records after 1 ms and completed after 36 ms.
Started streaming 9 records in less than 1 ms and completed after 37 ms.

The execution plan is more complicated, and clearly shows that all nodes in the graph are evaluated. Again, my test data set has 109,120 nodes in it, and we see exactly that number of nodes evaluated in the first step. If we had 1 million nodes in the graph, this approach would scan all 1 million (or 10 million, etc.).

EXPLAIN MATCH (n) RETURN DISTINCT labels(n)

AllNodesScan
    n
    n
    109,120 estimated rows

Distinct
    `labels(n)`
    labels(n) as `labels(n)`
    103,664 estimated rows

ProduceResults
    `labels(n)`
    `labels(n)`
    103,664 estimated rows

Result

While the result is correct, this approach is significantly more expensive to evaluate than solution #1.

Solution #3: Similar to solution #2, with additional steps of unwinding labels and returning distinct results from that

The query looks like this:

MATCH (n)
WITH DISTINCT labels(n) AS labels
UNWIND labels AS label
RETURN DISTINCT label
ORDER BY label

Here are several runs of this query, mid-30 ms range like solution #2:

Started streaming 9 records in less than 1 ms and completed after 33 ms.
Started streaming 9 records after 1 ms and completed after 32 ms.
Started streaming 9 records in less than 1 ms and completed after 37 ms.
Started streaming 9 records after 4 ms and completed after 36 ms.

The execution plan is similar to solution #2 at the beginning, but includes additional steps which involve nearly the entire data set:

EXPLAIN MATCH (n)
WITH DISTINCT labels(n) AS labels
UNWIND labels AS label
RETURN DISTINCT label
ORDER BY label

AllNodesScan
    n
    n
    109,120 estimated rows

Distinct
    labels
    labels(n) AS labels
    103,664 estimated rows

Unwind
    labels, label
    labels AS label
    1,036,640 estimated rows

Distinct
    label
    label
    984,808 estimated rows

Sort
    label
    label ASC
    Ordered by label ASC
    984,808 estimated rows

ProduceResults
    label
    label
    Ordered by label ASC
    984,808 estimated rows

Result

Conclusion

If your goal is to determine which labels exist in a graph, Solution #1 looks like the clear winner – it is not only the fasest and simplest approach, but it's performance is not bound by the number of nodes in the graph (so, it should remain fast even if you have more nodes).

I do not see any measurable benefit for using Solutions #2 or #3. Compared to Solution #1, both are slower and more complicated to write, and - unlike Solution #1 - their execution plans show that their performance is bound directly by the number of nodes in the graph. They will run more slowly with larger data sets.

Upvotes: 0

Guest

Reputation: 1

match(n) where n.name="abc" return labels(n)

it returns all the labels of the node "abc"

Upvotes: 0

Bruno Peres

Reputation: 16375

Neo4j 3.0 has introduced the procedure db.labels() witch return all available labels in the database. Use:

call db.labels();

Upvotes: 47

arganzheng

Reputation: 1334

If you want to get the labels of a specify node, then use labels(node); If you only want to get all node labels in neo4j, then use this function instead: call db.labels;, never ever use this query: MATCH n RETURN DISTINCT LABELS(n). It will do a full table scan, which is very very slow..

Upvotes: 4

petra

Reputation: 2792

To get all distinct node labels:

MATCH (n) RETURN distinct labels(n)

To get the node count for each label:

MATCH (n) RETURN distinct labels(n), count(*)

Upvotes: 110

ErnestoE

Reputation: 1304

If you want all the individual labels (not the combinations) you can always expand on the answers:

MATCH (n)
WITH DISTINCT labels(n) AS labels
UNWIND labels AS label
RETURN DISTINCT label
ORDER BY label

Upvotes: 25

Ken Williams

Reputation: 24005

If you're using the Java API, you can quickly get an iterator of all the Labels in the database like so:

GraphDatabaseService db = (new GraphDatabaseFactory()).newEmbeddedDatabase(pathToDatabase);
ResourceIterable<Label> labs = GlobalGraphOperations.at(db).getAllLabels();

Upvotes: 4