Reputation: 2750
Given that Datomic does not support pagination I'm wondering how to efficiently support a query such as:
Take the first 30 entities on
:history/body
, find entities whose:history/body
matches some regex.
Here's how I'd do regex matching alone:
{:find [?e]
:where [[?e :history/body ?body]
[(re-find #"foo.*bar$" ?body)]]}
Observations:
(take ...)
from those, but that is not the same as matching against the first 30 entities.take 30
then manually filter with re-find
, but if I have 30M entities, getting all of them just to take 30
seems wildly inefficient. Additionally: what if I wanted to take 20M out of my 30M entities and filter them via re-find
?Datomic docs talk about how queries are executed locally, but I've tried doing in-memory transformations on a set of 52913 entities (granted, they're fully touch
ed) and it takes ~5 seconds. Imagine how bad it'd be in the millions or 10s of millions.
Upvotes: 4
Views: 1197
Reputation: 9454
(Just brainstorming, here)
First of all, if you're ever using regexp, you may want to consider a fulltext index on :history/body so that you can do:
[(fulltext $ :history/body "foo*bar") [[?e]]]
(Note: You can't change :db/fulltext true/false
on an existing entity schema)
Sorting is something you have to do outside the query. But depending on your data, you may be able to constrain your query to a single "page" and then apply your predicate to just those entities.
For example, if we were only paginating :history
entities by an auto-incrementing :history/id
, then we'd know beforehand that "Page 3" is :history/id
61 to 90.
[:find ?e
:in $ ?min-id ?max-id
:where
[?e :history/id ?id]
(<= ?min-id ?id ?max-id)
(fulltext $ :history/body "foo*bar") [[?e]]]
Maybe something like this:
(defn get-filtered-history-page [page-n match]
(let [per-page 30
min-id (inc (* (dec page-n) per-page))
max-id (+ min-id per-page)]
(d/q '[:find ?e
:in $ ?min-id ?max-id ?match
:where
[?e :history/id ?id]
[(<= ?min-id ?id ?max-id)]
[(fulltext $ :history/body ?match) [[?e]]]]
(get-db) min-id max-id match)))
But, of course, the problem is that constraining the paginated set is usually based on an ordering you don't know in advance, so this isn't very helpful.
Upvotes: 1