khituras
khituras

Reputation: 1091

Neo4j 2.0: Indexing array-valued properties with schema indexing

I have nodes with multiple "sourceIds" in one array-valued property called "sourceIds", just because there could be multiple resources a node could be derived from (I'm assembling multiple databases into one Neo4j model). I want to be able to look up nodes by any of their source IDs. With legacy indexing this was no problem, I would just add a node to the index associated with each element of the sourceIds property array.

Now I wanted to switch to indexing with labels and I'm wondering how that kind of index works here. I can do

CREATE INDEX ON :<label>(sourceIds)

but what does that actually do? I hoped it would just create index entries for each array element, but that doesn't seem to be the case. With

MATCH n:<label> WHERE "testid" in n.sourceIds RETURN n

the query takes between 300ms and 500ms which is too long for an index lookup (other schema indexes work three to five times faster). With

MATCH n:<label> WHERE n.sourceIds="testid" RETURN n

I don't get a result. That's clear because it's an array property but I just gave it a try since it would make sense if array properties would be broken down to their elements for indexing purposes.

So, is there a way to handle array properties with schema indexing or are there plans or will I just have to stick to legacy indexing here? My problem with the legacy Lucene index was that I hit the max number of boolean clauses (1024). Another question thus would be: Can I raise this number? Lucene allows that, but can I do this with the Lucene index used by Neo4j?

Thanks and best regards!

Edit: A bit more elaboration on why I hit the boolean clauses max limit: I need to export specific parts of the database into custom file formats for text processing pipelines. These pipelines use components I cannot (be it for the sake of accessibility or time) change to query Neo4j directly, so I'd rather stay with the defined required file format(s). I do the export via the pattern "give me all IDs in the DB; now, for batches of IDs, query the desired information (e.g. specific paths) from Neo4j and store the results to file". Why I use batches at all? Well, if I don't, things are slowed down significantly via the connection overhead. Thus, large batches are a kind of optimization here.

Upvotes: 2

Views: 1529

Answers (1)

Eve Freeman
Eve Freeman

Reputation: 33155

Schema indexes can only do exact matches right now. Your "testid" in n.sourceIds does not use the index (as shown by your query times). I think there are plans to make this behave better, but I'm waiting for them just as eagerly as you are.

I've actually hit a lower max in the lucene query: 512. If there is a way to increase it I'd love to hear of it. The way I got around it is just doing more than one query if I have one of the rare cases that actually goes over 512 ids. What query are you doing where you need more?

Upvotes: 1

Related Questions