Reputation: 329
I am trying to take a blob out of an old MySQL table and make a new table for it, in an effort to reach first normal form. However, to covert the data already in the database from blob into multiple rows in the new table is turning out to be no easy feat.
What is the easiest way to achieve the conversion with SQL command?
Parent Table:
CREATE TABLE TEST.People (
`id` INT AUTO_INCREMENT,
`age` INT,
`height` INT,
`weight` INT ,
`variations` BLOB DEFAULT NULL,
PRIMARY KEY (`id`),
);
New Table:
CREATE TABLE TEST.Variations (
`id` INT AUTO_INCREMENT,
`chr` INT,
`start` INT,
`stop` INT ,
`type` ENUM('SNP','INDEL','CNV') DEFAULT NULL,
PRIMARY KEY (`id`),
);
When I run SELECT id,variations FROM TEST.People; I get:
+----+----------------------------------------------------------------------------------------------------------------------+
| id | variations |
+----+----------------------------------------------------------------------------------------------------------------------+
| 3 | xp t !3:124093754-124467278/CNVt 7:78030601-79638023/CNV |
| 6 | xp |
| 9 | xp |
| 12 | xp t !1:84289718-85466763/CNV |
| 15 | xp |
| 18 | xp |
| 21 | xp |
| 24 | xp |
| 27 | xp |
| 30 | xp t !10:166909544-166909544/SNPt !2:66903445-66903445/SNPt !2:166897864-166897864/CNVt !7:6892788-6892788/SNP |
+----+----------------------------------------------------------------------------------------------------------------------+
So what I want the TEST.Variations table to have after conversion is this:
+----+-----+-----------+-----------+----------+
| id | chr | start | stop | type |
+----+-----+-----------+-----------+----------+
| 3 | 3 | 124093754 | 124467278 | CNV |
| 3 | 7 | 78030601 | 79638023 | CNV |
| 12 | 1 | 84289718 | 85466763 | CNV |
| 30 | 10 | 166909544 | 166909544 | SNP |
| 30 | 2 | 66903445 | 66903445 | SNP |
| 30 | 2 | 166897864 | 166897864 | CNV |
| 30 | 7 | 6892788 | 6892788 | SNP |
+----+-----+-----------+-----------+----------+
Upvotes: 1
Views: 491
Reputation: 92805
First of all two things:
You have an inconsistency in data for id 3.There is no !
right before 7:...
. I hope that is just a typo
xp t !3:124093754-124467278/CNVt 7:78030601-79638023/CNV
^^
If you want to have an auto_increment
column in your target table then you schema should look something like this
CREATE TABLE variations
(
`var_id` INT NOT NULL AUTO_INCREMENT,
`id` INT, -- id from People goes here and it's not UNIQUE
`chr` INT,
`start` INT,
`stop` INT ,
`type` ENUM('SNP','INDEL','CNV') DEFAULT NULL,
PRIMARY KEY (`var_id`)
);
Now you can transfer data from People
to Variations
table with a query
INSERT INTO variations (id, chr, start, stop, type)
SELECT id,
SUBSTRING_INDEX(variation, ':', 1) chr,
SUBSTRING_INDEX(SUBSTRING_INDEX(variation, '-', 1), ':', -1) start,
SUBSTRING_INDEX(SUBSTRING_INDEX(variation, '-', -1), '/', 1) stop,
SUBSTRING_INDEX(variation, '/', -1) type
FROM
(
SELECT p.id, SUBSTRING_INDEX(SUBSTRING_INDEX(p.variations, 't !', n.n), 't !', -1) variation
FROM
(
SELECT id, SUBSTR(variations, 9) variations
FROM people
WHERE variations LIKE 'xp t !%'
) p CROSS JOIN
(
SELECT a.N + b.N * 10 + 1 n
FROM
(SELECT 0 AS N UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) a
,(SELECT 0 AS N UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) b
ORDER BY n
) n
WHERE n.n <= 1 + (LENGTH(p.variations) - LENGTH(REPLACE(p.variations, 't !', ''))) / 3
ORDER BY id
) q
ORDER BY id, chr, start, stop, type;
Note: this query will split up to 100 variations per id. If you need more or less you can adjust a limit by editing the inner subquery with n
alias, which produces number (tally) table on the fly.
Result:
| VAR_ID | ID | CHR | START | STOP | TYPE | |--------|----|-----|-----------|-----------|------| | 1 | 3 | 3 | 124093754 | 124467278 | CNV | | 2 | 3 | 7 | 78030601 | 79638023 | CNV | | 3 | 12 | 1 | 84289718 | 85466763 | CNV | | 4 | 30 | 10 | 166909544 | 166909544 | SNP | | 5 | 30 | 2 | 166897864 | 166897864 | CNV | | 6 | 30 | 2 | 66903445 | 66903445 | SNP | | 7 | 30 | 7 | 6892788 | 6892788 | SNP |
Here is SQLFiddle demo
Upvotes: 1