yazz.com
yazz.com

Reputation: 58796

How can I create an index for fast access on a clojure hash table?

I wish to store many records in a clojure hash table. If I wish to get fast access to certain records using a certain field or range query then what options do I have, without having to resort to storing the data in a database (which is where the data came from in the first place).

I guess I'm also wondering whether an STM is the right place for a large indexed data set as well.

Upvotes: 3

Views: 490

Answers (2)

Alex Miller
Alex Miller

Reputation: 70241

Depending how far you want to push this, you're asking to build an in-memory database. I assume you don't actually want to do that or presumably use one of the many in-memory Java databases that already exist (Derby, H2, etc).

If you want indexed or range access to multiple attributes of your data, then you need to create all of those indexes in Clojure data structures. Clojure maps will give you O(log32 n) time access to data (worse than constant, but still very bounded). If you need better than that, you can use Java maps like HashMap or ConcurrentHashMap directly with the caveat that you're outside the Clojure data model. For range access, you'll want some sort of sorted tree data structure... Java has ConcurentSkipListMap which is pretty great for what it does. If that's not good enough, you might need your own btree impl.

If you're not changing this data, then Clojure's STM is immaterial. Is this data treated as a cache of a subset of the database? If so, you might consider using a cache library like Ehcache instead (they've recently added support for very large off-heap caches and search capabilities).

Balancing data between in-memory cache and persistent store is a tricky business and one of the most important things to get right in data-heavy apps.

Upvotes: 5

mikera
mikera

Reputation: 106391

You'll probably want to create separate indexes for each field using a sorted-map so that you can do range queries. Under the hood this uses something like a persistent version of a Java TreeMap.

STM shouldn't be an issue if you are mostly interested in read access. In fact it might even prove better than mutable tables since:

  • Reads don't require any locking
  • You can make a consistent snapshot of data and indexes at the same time.

Upvotes: 2

Related Questions