Reputation: 2604
I have a table with millions of rows and a single column of text that is exactly 11,159 characters long. It looks like this:
1202012101...(to 11,159 characters)
1202020120...
0121210212...
...
(to millions of rows)
I realize that I can use
SELECT SUBSTR(column,2,4) FROM table;
...if I wanted to pull out characters 2, 3, 4, and 5:
1202012101...
1202020120...
0121210212...
^^^^
But I need to extract noncontiguous characters, e.g. characters 1,5,7:
1202012101...
1202020120...
0121210212...
^ ^ ^
I realize this can be done with a query like:
SELECT CONCAT(SUBSTR(colm,1,1),SUBSTR(colm,5,1),SUBSTR(colm,7,1)) FROM table;
But this query gets very unwieldy to build for thousands of characters that I need to select. So for the first part of the question - how do I build a query that does something like this:
SELECT CHARACTERS(string,1,5,7) FROM table;
Furthermore, the indices of the characters I want to select are from a different table that looks something like this:
char_index keep_or_discard
1 keep
2 discard
3 discard
4 discard
5 keep
7 discard
8 keep
9 discard
10 discard
So for the second part of the question, how could I build a query to select specific characters from the first table based on whether keep_or_discard="keep" for that character's index in the second table?
Upvotes: 5
Views: 532
Reputation: 6937
this function does what you want:
CREATE DEFINER = `root`@`localhost` FUNCTION `test`.`getsubset`(selection mediumtext, longstring mediumtext)
RETURNS varchar(200)
LANGUAGE SQL
NOT DETERMINISTIC
CONTAINS SQL
SQL SECURITY DEFINER
COMMENT 'This function returns a subset of characters.'
BEGIN
SET @res:='';
SET @selection:=selection;
WHILE @selection<>'' DO
set @pos:=CONVERT(@selection, signed);
set @res := concat_ws('',@res,SUBSTRING(longstring,@pos,1));
IF LOCATE(',',@selection)=0 THEN
SET @selection:='';
END IF;
set @selection:=SUBSTRING(@selection,LOCATE(',',@selection)+1);
END WHILE;
RETURN @res;
END
Note: the CONVERT('1,2,3,4',signed) will yield 1, but it will give a warning.
I have it defined to be available in the database test.
The function takes two parameters; a string(!) with a list of positions, and a long string from where you want the characters taken.
An example of using this:
mysql> select * from keepdiscard;
+---------+------------+
| charind | keepordisc |
+---------+------------+
| 1 | keep |
| 2 | discard |
| 3 | keep |
| 4 | discard |
| 5 | keep |
| 6 | keep |
+---------+------------+
6 rows in set (0.00 sec)
mysql> select * from test;
+-------------------+
| longstring |
+-------------------+
| abcdefghijklmnopq |
| 123456789 |
+-------------------+
2 rows in set (0.00 sec)
mysql> select getsubset(group_concat(charind ORDER BY charind),longstring) as result from keepdiscard, test where keepordisc='keep' group by longstring;
+--------+
| result |
+--------+
| 1356 |
| acef |
+--------+
2 rows in set, 6 warnings (0.00 sec)
The warnings stem from the fast conversion to integer that is done in the function. (See comment above)
Upvotes: 1
Reputation: 6691
The source of your difficulty is that your schema does not represent the true relationships between the data elements. If you wanted to achieve this with "pure" SQL, you would need a schema more like:
table
ID Index Char
1 0 1
1 1 2
1 2 0
charsToKeep
ID Index Keep
1 0 false
1 1 true
1 2 true
Then, you could perform a query like:
SELECT Char FROM table t JOIN charsToKeep c ON t.ID = c.ID WHERE c.Keep = true
However, you probably have good reasons for structuring your data the way you have (my schema requires much more storage space per character and the processing time is also probably much longer from what I am about to suggest).
Since SQL does not have the tools to understand the schema you have embedded into your table, you will need to add them with a user-defined function. Kevin's example of dynamic SQL may also work, but in my experience this is not as fast as a user-defined function.
I have done this in MS SQL many times, but never in MySql. You basically need a function, written in C or C++, that takes a comma-delimited list of the indexes you want to extract, and the string from which you want to extract them from. Then, the function will return a comma-delimited list of those extracted values. See these links for a good starting point:
http://dev.mysql.com/doc/refman/5.1/en/adding-functions.html
http://dev.mysql.com/doc/refman/5.1/en/adding-udf.html
To build the concatenated list of indexes you want to extract from the char_index table, try the group_concat function:
http://dev.mysql.com/doc/refman/5.0/en/group-by-functions.html#function_group-concat
Hope this helps!
Upvotes: 0
Reputation: 763
You can write a php script to do this for you:
<?php
//mysql connect
$conn = mysql_connect('localhost', 'mysql_user', 'mysql_password');
if (!$conn) {
echo 'Unable to connect to DB: ' . mysql_error();
exit;
}
//database connect
$db = mysql_select_db('mydb');
if (!$db) {
echo 'Unable to select mydb: ' . mysql_error();
exit;
}
//get the keep numbers you’re going to use.
//and change the number into string so, for example, instead of 5 you get 'SUBSTR(colm,5,1)'
$result = mysql_query("SELECT number FROM number_table WHERE keep_or_discard='keep'");
$numbers = array();
while ($row = mysql_fetch_assoc($result)) {
$row = 'SUBSTR(colm,' . $row . ',1)';
$numbers = $row;
}
//implode the array so you get one long string with all the substrings
//eg. 'SUBSTR(colm,1,1),SUBSTR(colm,5,1),SUBSTR(colm,12,1)'
$numbers = implode(",", $numbers);
//pull the numbers you need and save them to an array.
$result = mysql_query("SELECT " . $numbers . " FROM table");
$concat = array();
while ($row = mysql_fetch_assoc($result)) {
$concat= $row;
}
And there you have an array with the correct numbers.
I'm sorry if you can't/don't want to use PHP for this, I just don't really know how to do this without PHP, Perl, Python or some other similar language. Hopefully this solution will help somehow...
Upvotes: 0
Reputation: 11924
How about dynamic sql? (You will need to build the select part of the query)
CREATE PROCEDURE example_procedure()
BEGIN
--
--build the concat values here
--
SET @ids := '';
SET @S = 'SELECT @ids := built_concat_of_values FROM table';
PREPARE n_StrSQL FROM @S;
EXECUTE n_StrSQL;
DEALLOCATE PREPARE n_StrSQL;
END
Upvotes: 0