Andy
Andy

Reputation: 5395

Optimise searching a given table when results require multiple JOINs

I'm working on an application which is a large database of chemical substances (approx 250,000 but rising) and associated data. I'm looking at ways to optimise the way searching is performed.

The application is running under PHP 7.0.27, MariaDB 5.5.56, and Apache 2.4.6

The application allows searching by chemical name and various chemical codes (such as EC number and CAS number). The schema is such that there are separate tables to hold the data, and the relationships of which codes apply to which chemicals.

These tables are in the database:

Note: there are other tables than the ones above where similar logic will apply, but for now I want to focus on these for this example.

It is possible for a substance to have multiple EC/CAS numbers, and a small number do not have them - i.e. it's not a simple 1:1 relationship.

The application has search fields for the substance name (substances.name), EC number (ecs.value) CAS number (cas.value). These can be used on their own, or in conjuction with each other. For example: find a substance by name, or find a substance by name and CAS number.

I believe the "quickest" way of performing a search for any given value would be to use a LIKE condition on the specific table required. So if I want to look up substances which have "acids" as part of the name:

SELECT id FROM substances WHERE name LIKE '%acids%' LIMIT 0,250

However the results that the application gives are shown in a table which includes headings for substance name, CAS number, EC number. It also allows the results to be ordered on a column (e.g. order by substance name, CAS, EC, etc). This requires JOIN conditions.

I'm doing it like this:

$sql = 'SELECT 
DISTINCT(substances.`id`),            
substances.`name`,
"" AS cas_number, 
"" AS ec_number
FROM
 substances ';
    
// Search - EC Number, or if trying to order by EC column (JOIN has to occur to make that possible)
if ( (isset($search['ecNumber'])) || (isset($order['column']) && ($order['column'] == 'ec_number')) ) {
    $sql .= ' LEFT JOIN ecs_substances ON substances.id = ecs_substances.substance_id LEFT JOIN ecs ON ecs_substances.ec_id = ecs.id ';
}

// Search - CAS Number, or if trying to order by CAS column (JOIN has to occur to make that possible)
if ( (isset($search['casNumber'])) || (isset($order['column']) && ($order['column'] == 'cas_number')) ) {
$sql .= ' LEFT JOIN cas_substances ON cas_substances.substance_id = substances.id LEFT JOIN cas ON cas_substances.cas_id = cas.id ';
}

The problem is that because of all the JOINs that are occurring it's slowing down how quickly the results can be obtained.

Benchmark: The first query I posted which just uses a LIKE condition on 1 table will execute in 140ms, whereas it's taking 506ms for the same search criteria with all of the JOIN statements in the second block of code.

I'd like to know if there are ways to optimise this such that the time taken to present results to the user decreases.

It's worth mentioning that the results are displayed in DataTables and PHP is producing a JSON feed of the results. The LIMIT 0,250 is something the end user can override by setting results per page, but I'm happy to limit them to say no more than 500 per page.

Some things I've looked into are:

Upvotes: 0

Views: 53

Answers (2)

Rick James
Rick James

Reputation: 142296

  • Consider FULLTEXT because it allows for much faster searching than LIKE with a leading wildcard %. `MATCH(col) AGAINST('+acid' IN BOOLEAN MODE)

  • Sounds like you need a "many:many" mapping table. Tips on efficiency in such: http://mysql.rjweb.org/doc.php/index_cookbook_mysql#many_to_many_mapping_table

  • Consider using GROUP_CONCAT(cas) for provideing a comma-list of CASs.

  • JSON does not seem practical. And even less so since you are using only MySQL 5.5.

Upvotes: 1

KIKO Software
KIKO Software

Reputation: 16688

I think a response time of half a second is quite good, given what you want to do. You must have done all necessary database optimizations? (db type, indexes, etc).

There are several things you could explore:

  1. Prepare all possible searches and store them in a database for quick access. This may sound stupid but this is how I often achieve fast searches. It's difficult for me to judge what the best way to do this, with your data, would be. You could start by adding a TEXT column to your substances table and store all the information about the substance in it: It's name, and all EC/CAS numbers. Separate the items with something like '|', or any other character not used in searches. I would call that the 'search' column. Alternatively you could make a new table, just for searching with that column in it, and the id of the substance. Now you can make one search input field for all three types of data and search in one column only. Would that work for you? Would it be faster? Possibly, but I cannot guarantee it. I don't know, but it's quite easy to try. There is a disadvantage: You would have to update that column with every change in the database.

  2. Use a proper search engine. Several are available for mariadb. Start at: https://mariadb.com/kb/en/library/about-sphinxse It basically does something far more advanced than what I described under point 1: Prepare a database with data for optimized searching.

Still, a response of half a second would be something I could live with.

Upvotes: 1

Related Questions