imolit
imolit

Reputation: 8332

MySQL query combining several tables

Background

In order to obtain data for my thesis I have to work with a large, fairly complicated MySQL database, containing several tables and hundreds of GBs of data. Unfortunately, I am new to SQL, and can't really figure out how to extract the data that I need.

Database

The database consists of several tables that I want to combine. Here are the relevant parts of it:

> show tables;
+---------------------------+
| Tables_in_database        |
+---------------------------+
| Build                     |
| Build_has_ModuleRevisions |
| Configuration             |
| ModuleRevisions           |
| Modules                   |
| Product                   |
| TestCase                  |
| TestCaseResult            |
+---------------------------+

The tables are linked together in the following manner

Product ---(1:n)--> Configurations ---(1:n)--> Build

Build ---(1:n)--> Build_has_ModuleRevisions ---(n:1)--> ModuleRevision ---(n:1)--> Modules

Build ---(1:n)--> TestCaseResult ---(n:1)--> TestCase

The contents of the tables are

> describe Product;
+---------+--------------+------+-----+---------+----------------+
| Field   | Type         | Null | Key | Default | Extra          |
+---------+--------------+------+-----+---------+----------------+
| id      | int(11)      | NO   | PRI | NULL    | auto_increment |
| name    | varchar(255) | NO   | UNI | NULL    |                |
+---------+--------------+------+-----+---------+----------------+


> describe Configuration;
+------------+--------------+------+-----+---------+----------------+
| Field      | Type         | Null | Key | Default | Extra          |
+------------+--------------+------+-----+---------+----------------+
| id         | int(11)      | NO   | PRI | NULL    | auto_increment |
| Product_id | int(11)      | YES  | MUL | NULL    |                |
| name       | varchar(255) | NO   | UNI | NULL    |                |
+------------+--------------+------+-----+---------+----------------+


> describe Build;
+------------------+--------------+------+-----+---------+----------------+
| Field            | Type         | Null | Key | Default | Extra          |
+------------------+--------------+------+-----+---------+----------------+
| id               | int(11)      | NO   | PRI | NULL    | auto_increment |
| Configuration_id | int(11)      | NO   | MUL | NULL    |                |
| build_number     | int(11)      | NO   | MUL | NULL    |                |
| build_id         | varchar(32)  | NO   | MUL | NULL    |                |
| test_status      | varchar(255) | NO   |     |         |                |
| start_time       | datetime     | YES  | MUL | NULL    |                |
| end_time         | datetime     | YES  | MUL | NULL    |                |
+------------------+--------------+------+-----+---------+----------------+


> describe Build_has_ModuleRevisions;
+-------------------+----------+------+-----+---------+----------------+
| Field             | Type     | Null | Key | Default | Extra          |
+-------------------+----------+------+-----+---------+----------------+
| id                | int(11)  | NO   | PRI | NULL    | auto_increment |
| Build_id          | int(11)  | NO   | MUL | NULL    |                |
| ModuleRevision_id | int(11)  | NO   | MUL | NULL    |                |
+-------------------+----------+------+-----+---------+----------------+


> describe ModuleRevisions;
+-----------+--------------+------+-----+---------+----------------+
| Field     | Type         | Null | Key | Default | Extra          |
+-----------+--------------+------+-----+---------+----------------+
| id        | int(11)      | NO   | PRI | NULL    | auto_increment |
| Module_id | int(11)      | NO   | MUL | NULL    |                |
| tag       | varchar(255) | NO   | MUL |         |                |
| revision  | varchar(255) | NO   | MUL |         |                |
+-----------+--------------+------+-----+---------+----------------+


> describe Modules;
+---------+--------------+------+-----+---------+----------------+
| Field   | Type         | Null | Key | Default | Extra          |
+---------+--------------+------+-----+---------+----------------+
| id      | int(11)      | NO   | PRI | NULL    | auto_increment |
| name    | varchar(255) | NO   | UNI | NULL    |                |
+---------+--------------+------+-----+---------+----------------+


> describe TestCase;
+--------------+--------------+------+-----+---------+----------------+
| Field        | Type         | Null | Key | Default | Extra          |
+--------------+--------------+------+-----+---------+----------------+
| id           | int(11)      | NO   | PRI | NULL    | auto_increment |
| TestSuite_id | int(11)      | NO   | MUL | NULL    |                |
| classname    | varchar(255) | NO   | MUL | NULL    |                |
| name         | varchar(255) | NO   | MUL | NULL    |                |
| testtype     | varchar(255) | NO   | MUL | NULL    |                |
+--------------+--------------+------+-----+---------+----------------+


> describe TestCaseResult;
+-------------+--------------+------+-----+---------+----------------+
| Field       | Type         | Null | Key | Default | Extra          |
+-------------+--------------+------+-----+---------+----------------+
| id          | int(11)      | NO   | PRI | NULL    | auto_increment |
| Build_id    | int(11)      | NO   | MUL | NULL    |                |
| TestCase_id | int(11)      | NO   | MUL | NULL    |                |
| status      | varchar(255) | NO   | MUL | NULL    |                |
| start_time  | datetime     | YES  | MUL | NULL    |                |
| end_time    | datetime     | YES  | MUL | NULL    |                |
+-------------+--------------+------+-----+---------+----------------+

As you can see the tables are linked with *_id fields. E.g. TestCaseResult is linked to a Build by the Build_id field, and to a TestCase by the TestCase_id field.

Problem Desciption

Now to my problem. Given a specific Configuration.name and Product.name as input, I need to find all modules+revisions and failed testcases, for every Build, sorted by Build.start_time.

What I Have Tried

The following query gives me all the Builds given a Configuration.name of config1 and a Product.name of product1

SELECT
    *
FROM
    `database`.`Build` AS b
        JOIN
    Configuration AS c ON c.id = b.Configuration_id
        JOIN
    Product as p ON p.id = c.Product_id
WHERE
    c.name = 'config1'
        AND p.name = 'product1'
ORDER BY b.start_time;

This doesn't even solve half of my problem, though. Now, for every build I need to

  1. Find all Modules linked to the Build
    • Extract the Modules.name field
    • Extract the ModuleRevision.revision field
  2. Find all TestCases linked to the Build
    • Where TestCaseResult.status = 'failure'
    • Extract the TestCase.name field linked to the TestCaseResult
  3. Associate the Build with the extracted module name+revisions and testcase names
  4. Present the data ordered by Build.start_time so that I can perform analyses on it.

In other words, of all the data available, I am only interested in linking the fields Modules.name, ModuleRevision.revision, TestCaseResult.status, and TestCaseResult.name to a particular Build, order this by Build.start_time and then output this to a Python program I have written.

The end result should be something similar to

Build Build.start_time    Modules+Revisions               Failed tests
    1         20140301    [(mod1, rev1), (mod2... etc]    [test1, test2, ...]
    2         20140401    [(mod1, rev2), (mod2... etc]    [test1, test2, ...]
    3         20140402    [(mod3, rev1), (mod2... etc]    [test1, test2, ...]
    4         20140403    [(mod1, rev3), (mod2... etc]    [test1, test2, ...]
    5         20140505    [(mod5, rev2), (mod2... etc]    [test1, test2, ...]

My question

Is there a good (and preferrably efficient) SQL query that can extract and present the data that I need?

If not, I am totally okay with extracting one or several supersets/subsets of the data in order to parse it with Python if necessary. But how do I extract the desired data?

Upvotes: 2

Views: 67

Answers (1)

David Sanders
David Sanders

Reputation: 4139

It looks to me like you'd need more than one query for this. The problem is that the relationships of Build <-> ModuleRevision and Build <- TestCaseResult are basically independent. ModuleRevisions and TestCaseResults don't really have anything to do with each other as far as the schema is concerned. You have to query for one and then the other. You can't get them both in one query because because each row in your results basically represents one record of the "deepest" related table (in this case, either ModuleRevision or TestCaseResult) including any related information from its parent tables. Therefore, I think you'd need something like the following:

SELECT
    M.name, MR.revision, B.id
FROM
    ModuleRevisions MR
INNER JOIN
    Modules M ON MR.Module_id = M.id
INNER JOIN
    Build_has_ModuleRevisions BHMR ON MR.id = BHMR.ModuleRevision_id
INNER JOIN
    Build B ON BHMR.Build_id = B.id
INNER JOIN
    Configuration C ON B.Configuration_id = C.id
INNER JOIN
    Product P ON C.Product_id = P.id
WHERE C.name = 'config1' AND P.name = 'product1'
ORDER BY B.start_time;

SELECT
    TCR.status, TC.name, B.id
FROM
    TestCaseResult TCR
INNER JOIN
    TestCase TC ON TCR.TestCase_id = TC.id
INNER JOIN
    Build B ON TCR.Build_id = B.id
INNER JOIN
    Configuration C ON B.Configuration_id = C.id
INNER JOIN
    Product P ON C.Product_id = P.id
WHERE C.name = 'config1' AND P.name = 'product1' and TCR.status = 'failure'
ORDER BY B.start_time;

Upvotes: 1

Related Questions