Reputation: 2338

What database to use for 15M records and a fast search index

So this is something I have been wondering about after working on a project for about nine months.

We have a postgres database and are using sunspot on top of solr in a rails app.

I was not here when we decided to use solr so I dont really know why we chose it in the first place. Everything works great for small sets of data but the real pain in reindexing each record after save.

This lets the index get out of date and we end up proccessing these in delayed jobs. This gets us by for now but each time we decide to re-work the index and how it is built, it takes more than 24h on production and causes our client to get mad.

I should note here that we are searching fields on contacts that are 255 chars at max. Most are only 25 chars. There are no pdf files or word files etc.

The end goal is to have a fast search and to have some auto complete searching. I also want us to fuzzy match a search. I want Bill Smith to match BillSmith and some other things.

To do this now I am custom building a part of the index on the contact model. This works but each time my boss adds a requirement like Bill Smith must Match Bill-Smith, I need to rebuild the index.

Is there a better thing to be using here than solr? I wonder if there is for this purpose. I would like in the end to have a search with some of the same features and speed that google has. (Not to that extreme) But I need the index to be rebuilt fast if I need an index at all.

This is for a rails app with about 15M db records in 30 tables.

Any guidance here would be great as we are about to consider dropping solr.

EDIT: Another question is that do you need an index for fast search? Cant postgres use its own index to get things just as fast?

Upvotes: 4

Answers (2)

Craig Ringer

Reputation: 324771

I also want us to fuzzy match a search. I want Bill Smith to match BillSmith and some other things

While PostgreSQL's full-text search can help you with this sort of thing, you may find that you need to supply a custom set of stems / a custom dictionary, or even write a custom tsearch parser depending on the details of your needs.

The basic tsearch isn't all that easily customised for application-specific text handling rules.

each time my boss adds a requirement like Bill Smith must Match Bill-Smith, I need to rebuild the index

You will have that with PostgreSQL full-text search too - and adding such requirements may be trickier.

Fundamentally I think that's an issue you will have with any indexing system. In theory an index could be partially updated in such a case - e.g. removing all entries for Bill, Smith, or BillSmith then adding them back in according to the new rules. I'm not sure any off the shelf system does that, though.

If you want something that performs like Google you'll probably need to throw monsterous computing resources at the task. It's amazing how fast a search can go when it's parallelized over 1000s nodes that have the data of interest cached in RAM.

Upvotes: 1

Denis de Bernardy

Reputation: 78523

Postgres would handle that quite well with full text search…

http://www.postgresql.org/docs/current/static/textsearch.html

Note that it allows to use all sorts of dictionaries if you don't like the built-in rules:

http://www.postgresql.org/docs/current/static/textsearch-dictionaries.html

It also has colorful tools, such as trigrams:

http://www.postgresql.org/docs/current/static/pgtrgm.html

Upvotes: 3

What database to use for 15M records and a fast search index

Answers (2)

Related Questions