David B
David B

Reputation: 29968

How can I accelerate this MySQL query?

geneHomology
============
id genome_name  gene_id  homolog_genome_name  homolog_gene_id consider_homolog
1  HomoSap      1007     MusMus               824             1
2  HomoSap      1007     MusMus               825             1
3  HomoSap      1007     MusMus               826             1
4  HomoSap      2890     EColi                2140            1
...

gene
====
genome_name  gene_id  gene_category
MusMus       823      Upregulated
MusMus       824      Downregulated
MusMus       825      Normal
MusMus       826      Normal
MusMus       827      Upregulated
EColi        2140     Normal
...

consider_homolog is an enum (0,1). genome_name and gene_id are primary keys in gene. geneHomology is very large - some 200M rows.

My goal is to count for each gene in genes how many homologs it has from each gene_category.

For example, following the data above, HomoSap 1007 has 3 Normal homologs and 1 Downregulated.

So my query is:

SELECT a.id,a.genome_name,a.gene_id,a.homolog_genome_name,a.homolog_gene_id,COUNT(b.gene_category)
FROM geneHomology a,gene b 
WHERE a.consider_homolog='1' AND a.homolog_genome_name=b.genome_name AND a.homolog_gene_id=b.gene_id 
GROUP BY a.genome_name,a.gene_id,b.gene_category;

It never returns (and I've been patiently waiting for more than an hour).

I already indexed gene_category in gene.

I'm really new to MySQL but I have root access to the DB so I could follow your suggestions (carefully...). I would be happy to provide any further information.

UPDATE This is the EXPLAIN output for the query:

+----+-------------+-------+------+-----------------------+----------------------+---------+----------------------------------------------------------+---------+---------------------------------+
| id | select_type | table | type | possible_keys         | key                  | key_len | ref                                                      | rows    | Extra                           |
+----+-------------+-------+------+-----------------------+----------------------+---------+----------------------------------------------------------+---------+---------------------------------+
|  1 | SIMPLE      | b     | ALL  | PRIMARY,gene_genome   | NULL                 | NULL    | NULL                                                     | 1560695 | Using temporary; Using filesort | 
|  1 | SIMPLE      | a     | ref  | geneHomologyHit_gene  | geneHomologyHit_gene | 54      | my_db_v71.b.gene_id,my_db_v71.b.genome_name              |      13 | Using where                     | 
+----+-------------+-------+------+-----------------------+----------------------+---------+----------------------------------------------------------+---------+---------------------------------+

UPDATE 2

mysql> SHOW INDEX FROM gene;
    +-------+------------+--------------------------+--------------+---------------------+-----------+-------------+----------+--------+------+------------+---------+
    | Table | Non_unique | Key_name                 | Seq_in_index | Column_name         | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
    +-------+------------+--------------------------+--------------+---------------------+-----------+-------------+----------+--------+------+------------+---------+
    | gene  |          0 | PRIMARY                  |            1 | gene_id             | A         |        NULL |     NULL | NULL   |      | BTREE      |         | 
    | gene  |          0 | PRIMARY                  |            2 | genome_name         | A         |     1560695 |     NULL | NULL   |      | BTREE      |         | 
    | gene  |          1 | gene_organism            |            1 | taxon_id            | A         |         392 |     NULL | NULL   |      | BTREE      |         | 
    | gene  |          1 | gene_genome              |            1 | genome_name         | A         |         853 |     NULL | NULL   |      | BTREE      |         | 
    | gene  |          1 | gene_gene_category       |            1 | gene_category       | A         |           5 |     NULL | NULL   |      | BTREE      |         | 
    +-------+------------+--------------------------+--------------+---------------------+-----------+-------------+----------+--------+------+------------+---------+
    5 rows in set (0.01 sec)

UPDATE 3

mysql> SHOW INDEX FROM geneHomology;
+--------------+------------+------------------------+--------------+--------------------------+-----------+-------------+----------+--------+------+------------+---------+
| Table        | Non_unique | Key_name               | Seq_in_index | Column_name              | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+--------------+------------+------------------------+--------------+--------------------------+-----------+-------------+----------+--------+------+------------+---------+
| geneHomology |          0 | PRIMARY                |            1 | id                       | A         |   680326661 |     NULL | NULL   |      | BTREE      |         | 
| geneHomology |          1 | geneHomologyQuery_gene |            1 | gene_id                  | A         |     1498516 |     NULL | NULL   |      | BTREE      |         | 
| geneHomology |          1 | geneHomologyQuery_gene |            2 | genome_name              | A         |     1505147 |     NULL | NULL   |      | BTREE      |         | 
| geneHomology |          1 | geneHomologyHit_gene   |            1 | homolog_gene_id          | A         |    52332820 |     NULL | NULL   |      | BTREE      |         | 
| geneHomology |          1 | geneHomologyHit_gene   |            2 | homolog_genome_name      | A         |    52332820 |     NULL | NULL   |      | BTREE      |         | 
+--------------+------------+------------------------+--------------+--------------------------+-----------+-------------+----------+--------+------+------------+---------+
5 rows in set (0.00 sec)

UPDATE 4 Is there a way to get only partial results to even see I'm getting what I want? I tried LIMIT 1000 and even LIMIT 10 but it doesn't seem to change anything.

UPDATE 5

mysql> SHOW CREATE TABLE geneHomology;
+--------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table        | Create Table                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     |
+--------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| geneHomology | CREATE TABLE `geneHomology` (
  `id` bigint(20) NOT NULL auto_increment,
  `genome_name` varchar(20) NOT NULL,
  `gene_id` varchar(30) NOT NULL,
  `homolog_genome_name` varchar(20) NOT NULL,
  `homolog_gene_id` varchar(30) NOT NULL,
  `homolog_length` bigint(20) unsigned NOT NULL,
  `significance` double unsigned NOT NULL,
  `bit_score` double unsigned NOT NULL,
  `percent_identity` double unsigned NOT NULL,
  `start_match` int(10) unsigned NOT NULL,
  `end_match` int(10) unsigned NOT NULL,
  `start_match_percent` double unsigned NOT NULL,
  `end_match_percent` double unsigned NOT NULL,
  `strand` enum('+','-') default NULL,
  `homolog_start_match` int(10) unsigned NOT NULL,
  `homolog_end_match` int(10) unsigned NOT NULL,
  `homolog_start_match_percent` double unsigned NOT NULL,
  `homolog_end_match_percent` double unsigned NOT NULL,
  `homolog_strand` enum('+','-') default NULL,
  `consider_gene_homology` enum('0','1') NOT NULL,
  `reason_not_considered` varchar(50) default NULL,
  `num_hsps` int(10) unsigned NOT NULL,
  `homology_type` varchar(2) NOT NULL,
  PRIMARY KEY  (`id`),
  KEY `geneHomologygene` (`gene_id`,`genome_name`),
  KEY `geneHomologyhomolog_gene` (`homolog_gene_id`,`homolog_genome_name`)
) ENGINE=MyISAM AUTO_INCREMENT=680326662 DEFAULT CHARSET=latin1 | 
+--------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

mysql> SHOW CREATE TABLE gene;
+-------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     |
+-------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| gene  | CREATE TABLE `gene` (
  `taxon_id` int(10) unsigned NOT NULL,
  `genome_name` varchar(20) NOT NULL,
  `gene_id` varchar(30) NOT NULL,
  `symbol` varchar(30) default NULL,
  `type` varchar(30) default NULL,
  `product` varchar(300) default NULL,
  `strand` enum('+','-') NOT NULL,
  `start` bigint(20) unsigned NOT NULL,
  `end` bigint(20) unsigned NOT NULL,
  `gene_category` enum('Upregulated','Downregulated','Normal','n/a') NOT NULL,
  `consider_gene` enum('0','1') NOT NULL,
  `reason_not_considered` varchar(50) default NULL,
  `sequence` longblob NOT NULL,
  `additional_info` varchar(300) default NULL,
  PRIMARY KEY  (`gene_id`,`genome_name`),
  KEY `gene_organism` (`taxon_id`),
  KEY `gene_genome` (`genome_name`),
  KEY `gene_gene_category` (`gene_category`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 | 
+-------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

Upvotes: 3

Views: 900

Answers (3)

Quassnoi
Quassnoi

Reputation: 425251

SELECT  a.genome_name, a.gene_id,
        cats.gene_category,
        (
        SELECT  COUNT(*)
        FROM    geneHomology ab
        JOIN    gene b
        ON      b.genome_name = ab.homolog_genome_name
                AND b.gene_id = ab.homolog_gene_id
        WHERE   ab.genome_name = a.genome_name
                AND ab.gene_id = a.gene_id
                AND b.gene_category = cats.gene_category
        ) cx
FROM    gene a
CROSS JOIN
        (
        SELECT  'Normal' AS gene_category
        UNION ALL
        SELECT  'Upregulated' AS gene_category
        UNION ALL
        SELECT  'Downregulated' AS gene_category
        ) cats
LIMIT 100

This will remove filesort from your plan.

If you have a table with all possible gene_categories, replace the cats with it.

Upvotes: 1

symcbean
symcbean

Reputation: 48357

First remove the reference to genome_name from the WHERE part of the query - if both gene.gene_id and gene.genome_name are unique then here is a clear functional dependency here which is confusing the issue somewhat - the number/number join will be marginally mroe efficient that the text/text join.

Looking at the plan it rather implies you've already got an index on geneHomology.hit_gene_id . If this is the case then there's not much scope for making the query go faster without schema changes. However a key length of 54 suggests you've got a lot of stuff in that index which should not be there. Trimming this down to just hit_gene_id and consider_homolog will help a little with the performance, but the limiting factor is that there does not seem any way to avoid the full table scan on gene unless there are other functional dependencies in there.

How quickly does it take to complete a 'SELECT * FROM gene' ? How many records in ene_homology?

It looks like geneHomology decomposes a N:M relationship between gene and gene (itself) and applies labels.

If the number of values in homolog_genome_name is relatively small you might consider decomposing this into gene using bitmap field. Or possibly denormalise the relationship into a set of 1:1 mappings. Alternatively you could enumerate the homolog clusters.

Upvotes: 0

Jaydee
Jaydee

Reputation: 4158

From what you've posted here, I would reccomend a minor bit of denormalisation and put gene_category into geneHomolgy. You can then get rid of the join entirely and you can create an index on your consider_homolog + GROUP BY fields.

Upvotes: 0

Related Questions