Reputation: 144
I am using Datahike 0.6.1531 (not Datomic) on the JVM. I have a list of book titles to display in a web app. If the book is "notable", I do something special, like apply a background-color or append an emoji to it.
I would like to return a vector that resembles something like this:
[{:db/id 339, :resource-name "Notation as a Tool of Thought"}
{:db/id 338, :resource-name "The Science of Radio", :notable? :true}
{:db/id 337, :resource-name "Journey Into Mathematics"}
{:db/id 336, :resource-name "Street Fighting Mathematics"}
...]
Performing the following pull-many
query with 3 attr-ids (including :db/id
) on a range of 400 or so entities requires ~2,900 ms:
(require '[datahike.api :as d]) ; version 0.6.1531
(d/pull-many @conn [:db/id :resource-name :notable?]
(range 1 400))
Is the slow query time an inherent trade-off of EAV databases, or am I failing to optimize in some very obvious way?
Upvotes: 0
Views: 138
Reputation: 144
This issue was addressed and solved by Datahike maintainers in this pull request: https://github.com/replikativ/datahike/pull/653
Upvotes: 1
Reputation: 5402
This question initially emerged in the DataHike channel in the Clojurians slack, so I edit up my answers from there into a longer single post answer. DataHike is one of many implementations of a Datalog query engine in Clojure/ClojureScript.
Sinceyou are using ordinary JVM Clojure (which tends to be most high performant) I would expect a quicker result. My experience from Datomic is that such a query should be many times faster.
DataScript (which is the datalog implementation DataHike was initially based up on) can sometimes be somewhat slow, but the result seems a bit too slow still for a clojure/script environment.
The implementations of the pull-api in DataScript vs the implementation of the pull-api in DataHike is quite different, where DataHike seems to do somewhat more book-keeping and checking than DataScript does (which makes DataHike much slower slower, but also easier to get it work correctly).
JIT/better performance metric
Your query is of at most 400 entities in the database. This is probably to few to trigger JIT recompilations/optimizations (which is what makes all this quicker in long running production code). Tools like Criterium (clojure) or Tufte (ClojureScript port of Criterium) will run a lot of tests to make sure that the JVM or JavaScript VM is "hot", that most of the JIT optimizations in already in place.
I suggest you test the performance with either Criterium or Tufte.
DataHike is still more experimental (but usable) and has a different scope than DataScript which might make the query engine much slower, still, but it should not be this slow (IMHO).
Upvotes: 0