bnp0005
bnp0005

Reputation: 329

MySQL Converting Database Column From Blob to separate pieces

I am trying to take a blob out of an old MySQL table and make a new table for it, in an effort to reach first normal form. However, to covert the data already in the database from blob into multiple rows in the new table is turning out to be no easy feat.

What is the easiest way to achieve the conversion with SQL command?

Parent Table:

CREATE  TABLE TEST.People (
  `id` INT  AUTO_INCREMENT,
  `age` INT,
  `height` INT,
  `weight` INT  ,
  `variations` BLOB DEFAULT NULL,
  PRIMARY KEY (`id`), 
);

New Table:

CREATE  TABLE TEST.Variations (
  `id` INT  AUTO_INCREMENT,
  `chr` INT,
  `start` INT,
  `stop` INT  ,
  `type` ENUM('SNP','INDEL','CNV') DEFAULT NULL,
  PRIMARY KEY (`id`), 
);

When I run SELECT id,variations FROM TEST.People; I get:

+----+----------------------------------------------------------------------------------------------------------------------+
| id | variations                                                                                                           |
+----+----------------------------------------------------------------------------------------------------------------------+
|  3 | xp   t !3:124093754-124467278/CNVt 7:78030601-79638023/CNV                                                           |
|  6 | xp                                                                                                                   |
|  9 | xp                                                                                                                   |
| 12 | xp   t !1:84289718-85466763/CNV                                                                                      |
| 15 | xp                                                                                                                   |
| 18 | xp                                                                                                                   |
| 21 | xp                                                                                                                   |
| 24 | xp                                                                                                                   |
| 27 | xp                                                                                                                   |
| 30 | xp   t !10:166909544-166909544/SNPt !2:66903445-66903445/SNPt !2:166897864-166897864/CNVt !7:6892788-6892788/SNP     |
+----+----------------------------------------------------------------------------------------------------------------------+

So what I want the TEST.Variations table to have after conversion is this:

+----+-----+-----------+-----------+----------+
| id | chr | start     | stop      | type     |  
+----+-----+-----------+-----------+----------+
|  3 |   3 | 124093754 | 124467278 | CNV      |
|  3 |   7 |  78030601 |  79638023 | CNV      |
| 12 |   1 |  84289718 |  85466763 | CNV      |
| 30 |  10 | 166909544 | 166909544 | SNP      |
| 30 |   2 |  66903445 |  66903445 | SNP      |
| 30 |   2 | 166897864 | 166897864 | CNV      |
| 30 |   7 |   6892788 |   6892788 | SNP      |
+----+-----+-----------+-----------+----------+

Upvotes: 1

Views: 491

Answers (1)

peterm
peterm

Reputation: 92805

First of all two things:

  1. You have an inconsistency in data for id 3.There is no ! right before 7:.... I hope that is just a typo

    xp   t !3:124093754-124467278/CNVt 7:78030601-79638023/CNV
                                      ^^
    
  2. If you want to have an auto_increment column in your target table then you schema should look something like this

    CREATE  TABLE variations 
    (
      `var_id` INT NOT NULL AUTO_INCREMENT,
      `id`    INT, -- id from People goes here and it's not UNIQUE
      `chr`   INT,
      `start` INT,
      `stop`  INT ,
      `type`  ENUM('SNP','INDEL','CNV') DEFAULT NULL,
      PRIMARY KEY (`var_id`) 
    );
    

Now you can transfer data from People to Variations table with a query

INSERT INTO variations (id, chr, start, stop, type)
SELECT id, 
       SUBSTRING_INDEX(variation, ':', 1) chr,
       SUBSTRING_INDEX(SUBSTRING_INDEX(variation, '-', 1), ':', -1) start,
       SUBSTRING_INDEX(SUBSTRING_INDEX(variation, '-', -1), '/', 1) stop,
       SUBSTRING_INDEX(variation, '/', -1) type
  FROM
(
  SELECT p.id, SUBSTRING_INDEX(SUBSTRING_INDEX(p.variations, 't !', n.n), 't !', -1) variation
    FROM 
  (
    SELECT id, SUBSTR(variations, 9) variations
      FROM people 
     WHERE variations LIKE 'xp   t !%'
  ) p CROSS JOIN 
  (
     SELECT a.N + b.N * 10 + 1 n
       FROM 
      (SELECT 0 AS N UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) a
     ,(SELECT 0 AS N UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) b
      ORDER BY n
  ) n
   WHERE n.n <= 1 + (LENGTH(p.variations) - LENGTH(REPLACE(p.variations, 't !', ''))) / 3
   ORDER BY id
) q
 ORDER BY id, chr, start, stop, type;

Note: this query will split up to 100 variations per id. If you need more or less you can adjust a limit by editing the inner subquery with n alias, which produces number (tally) table on the fly.

Result:

| VAR_ID | ID | CHR |     START |      STOP | TYPE |
|--------|----|-----|-----------|-----------|------|
|      1 |  3 |   3 | 124093754 | 124467278 |  CNV |
|      2 |  3 |   7 |  78030601 |  79638023 |  CNV |
|      3 | 12 |   1 |  84289718 |  85466763 |  CNV |
|      4 | 30 |  10 | 166909544 | 166909544 |  SNP |
|      5 | 30 |   2 | 166897864 | 166897864 |  CNV |
|      6 | 30 |   2 |  66903445 |  66903445 |  SNP |
|      7 | 30 |   7 |   6892788 |   6892788 |  SNP |

Here is SQLFiddle demo

Upvotes: 1

Related Questions