devth
devth

Reputation: 2750

Efficient Datomic query to perform filtering on paginated sets

Given that Datomic does not support pagination I'm wondering how to efficiently support a query such as:

Take the first 30 entities on :history/body, find entities whose :history/body matches some regex.

Here's how I'd do regex matching alone:

{:find [?e]
 :where [[?e :history/body ?body]
         [(re-find #"foo.*bar$" ?body)]]}

Observations:

  1. I could then (take ...) from those, but that is not the same as matching against the first 30 entities.
  2. I could get all entities, take 30 then manually filter with re-find, but if I have 30M entities, getting all of them just to take 30 seems wildly inefficient. Additionally: what if I wanted to take 20M out of my 30M entities and filter them via re-find?

Datomic docs talk about how queries are executed locally, but I've tried doing in-memory transformations on a set of 52913 entities (granted, they're fully touched) and it takes ~5 seconds. Imagine how bad it'd be in the millions or 10s of millions.

Upvotes: 4

Views: 1197

Answers (1)

danneu
danneu

Reputation: 9454

(Just brainstorming, here)

First of all, if you're ever using regexp, you may want to consider a fulltext index on :history/body so that you can do:

[(fulltext $ :history/body "foo*bar") [[?e]]]

(Note: You can't change :db/fulltext true/false on an existing entity schema)

Sorting is something you have to do outside the query. But depending on your data, you may be able to constrain your query to a single "page" and then apply your predicate to just those entities.

For example, if we were only paginating :history entities by an auto-incrementing :history/id, then we'd know beforehand that "Page 3" is :history/id 61 to 90.

[:find ?e
 :in $ ?min-id ?max-id
 :where
 [?e :history/id ?id]
 (<= ?min-id ?id ?max-id)
 (fulltext $ :history/body "foo*bar") [[?e]]]

Maybe something like this:

(defn get-filtered-history-page [page-n match]
  (let [per-page 30
        min-id (inc (* (dec page-n) per-page))
        max-id (+ min-id per-page)]
    (d/q '[:find ?e
           :in $ ?min-id ?max-id ?match
           :where
           [?e :history/id ?id]
           [(<= ?min-id ?id ?max-id)]
           [(fulltext $ :history/body ?match) [[?e]]]]
      (get-db) min-id max-id match)))

But, of course, the problem is that constraining the paginated set is usually based on an ordering you don't know in advance, so this isn't very helpful.

Upvotes: 1

Related Questions