Mathieu Brousseau
Mathieu Brousseau

Reputation: 134

Quick search on huge mysql database

I need to index a huge mysql database ( 5 billions record ). I will rarely create, update or delete

only some request like SELECT * FROM tbl_person WHERE name LIKE 'foo%'

I have already set my index on field.

With one will be better for best performance on read request

1 single request on a 5billions records table or 10 request on 500millions records table and join result after.

Upvotes: 2

Views: 11805

Answers (2)

jon__o
jon__o

Reputation: 1519

There are a lot of things that you can do. First, if you regularly search by the persons name, consider partitioning your table by the first or first few letters of the persons name. See Partitioning Types.

For example, since Integers are MUCH faster to search by, you could make a field called name_abbr that is a smallint that represents the persons first 2 or 3 letters of the their name. You would index and partition off of this field! So, aaa would be 1, aab would be 2, and so on. Your query would look something like this:

SELECT * FROM Table WHERE name_abbr=123 AND name LIKE 'foo%';

Now, this will hit the correct partition and the LIKE will only have to check on a much smaller record set.

There are a lot of other things that you can do, but remember that with large data sets, it's always best to break down the data into groups and always try to use integers for queries whenever possible. Also, make sure to use the EXPLAIN keyword to make sure that your queries are using the indexes and partitions that you want them to use.

Upvotes: 9

Matthew
Matthew

Reputation: 10444

You're never going to get "good" performance using the LIKE clause on a MySQL db that size because it cannot may not use the index.

If you're going to need efficient queries like that then you should consider different storage designs. A common method is to have separate, indexed columns for each length of the string.

For example:

  1. foobar
  2. fooba
  3. foob
  4. foo
  5. fo
  6. f

you will significantly increase the data in your warehouse, but your requirements will demonstrate if this is an appropriate trade-off... is space cheaper than performance? Only you know the answer.

EDIT: I see the second part of your question, now. You ask whether it's preferable to have one query on 5b records or 10 query on 500m rec and UNION the result....

I am inclined to say that you're almost certain to have better results with the single query unless you have an effective sharding system to branch off the queries.... you're going to have to test it to verify on your existing infrastructure.

Upvotes: 2

Related Questions