Stephen Turner
Stephen Turner

Reputation: 2604

How do I select noncontiguous characters from a string of text in MySQL?

I have a table with millions of rows and a single column of text that is exactly 11,159 characters long. It looks like this:

1202012101...(to 11,159 characters)
1202020120...
0121210212...
...
(to millions of rows)

I realize that I can use

SELECT SUBSTR(column,2,4) FROM table;

...if I wanted to pull out characters 2, 3, 4, and 5:

1202012101...
1202020120...
0121210212...
 ^^^^

But I need to extract noncontiguous characters, e.g. characters 1,5,7:

1202012101...
1202020120...
0121210212...
^   ^ ^

I realize this can be done with a query like:

SELECT CONCAT(SUBSTR(colm,1,1),SUBSTR(colm,5,1),SUBSTR(colm,7,1)) FROM table;

But this query gets very unwieldy to build for thousands of characters that I need to select. So for the first part of the question - how do I build a query that does something like this:

SELECT CHARACTERS(string,1,5,7) FROM table;

Furthermore, the indices of the characters I want to select are from a different table that looks something like this:

char_index   keep_or_discard
1            keep
2            discard
3            discard
4            discard
5            keep
7            discard
8            keep
9            discard
10           discard

So for the second part of the question, how could I build a query to select specific characters from the first table based on whether keep_or_discard="keep" for that character's index in the second table?

Upvotes: 5

Views: 532

Answers (4)

Eljakim
Eljakim

Reputation: 6937

this function does what you want:

CREATE DEFINER = `root`@`localhost` FUNCTION `test`.`getsubset`(selection mediumtext, longstring mediumtext)
RETURNS varchar(200)
LANGUAGE SQL
NOT DETERMINISTIC
CONTAINS SQL
SQL SECURITY DEFINER
COMMENT 'This function returns a subset of characters.'
BEGIN
  SET @res:='';
  SET @selection:=selection;
  WHILE @selection<>'' DO
    set @pos:=CONVERT(@selection, signed);
    set @res := concat_ws('',@res,SUBSTRING(longstring,@pos,1));
    IF LOCATE(',',@selection)=0 THEN 
       SET @selection:='';
    END IF;
    set @selection:=SUBSTRING(@selection,LOCATE(',',@selection)+1);
  END WHILE;
  RETURN @res;
END

Note: the CONVERT('1,2,3,4',signed) will yield 1, but it will give a warning.

I have it defined to be available in the database test.

The function takes two parameters; a string(!) with a list of positions, and a long string from where you want the characters taken.

An example of using this:

mysql> select * from keepdiscard;
+---------+------------+
| charind | keepordisc |
+---------+------------+
|       1 | keep       |
|       2 | discard    |
|       3 | keep       |
|       4 | discard    |
|       5 | keep       |
|       6 | keep       |
+---------+------------+
6 rows in set (0.00 sec)

mysql> select * from test;
+-------------------+
| longstring        |
+-------------------+
| abcdefghijklmnopq |
| 123456789         |
+-------------------+
2 rows in set (0.00 sec)

mysql> select getsubset(group_concat(charind ORDER BY charind),longstring) as result from keepdiscard, test  where keepordisc='keep' group by longstring;
+--------+
| result |
+--------+
| 1356   |
| acef   |
+--------+
2 rows in set, 6 warnings (0.00 sec)

The warnings stem from the fast conversion to integer that is done in the function. (See comment above)

Upvotes: 1

Matt Faus
Matt Faus

Reputation: 6691

The source of your difficulty is that your schema does not represent the true relationships between the data elements. If you wanted to achieve this with "pure" SQL, you would need a schema more like:

table
ID    Index    Char
1     0        1
1     1        2
1     2        0

charsToKeep
ID    Index    Keep
1     0        false
1     1        true
1     2        true

Then, you could perform a query like:

SELECT Char FROM table t JOIN charsToKeep c ON t.ID = c.ID WHERE c.Keep = true

However, you probably have good reasons for structuring your data the way you have (my schema requires much more storage space per character and the processing time is also probably much longer from what I am about to suggest).

Since SQL does not have the tools to understand the schema you have embedded into your table, you will need to add them with a user-defined function. Kevin's example of dynamic SQL may also work, but in my experience this is not as fast as a user-defined function.

I have done this in MS SQL many times, but never in MySql. You basically need a function, written in C or C++, that takes a comma-delimited list of the indexes you want to extract, and the string from which you want to extract them from. Then, the function will return a comma-delimited list of those extracted values. See these links for a good starting point:

http://dev.mysql.com/doc/refman/5.1/en/adding-functions.html

http://dev.mysql.com/doc/refman/5.1/en/adding-udf.html

To build the concatenated list of indexes you want to extract from the char_index table, try the group_concat function:

http://dev.mysql.com/doc/refman/5.0/en/group-by-functions.html#function_group-concat

Hope this helps!

Upvotes: 0

riku
riku

Reputation: 763

You can write a php script to do this for you:

<?php
    //mysql connect
    $conn = mysql_connect('localhost', 'mysql_user', 'mysql_password');

    if (!$conn) {
        echo 'Unable to connect to DB: ' . mysql_error();
        exit;
    }

    //database connect
    $db = mysql_select_db('mydb');

    if (!$db) {
        echo 'Unable to select mydb: ' . mysql_error();
        exit;
    }

    //get the keep numbers you’re going to use.
    //and change the number into string so, for example, instead of 5 you get 'SUBSTR(colm,5,1)'
    $result = mysql_query("SELECT number FROM number_table WHERE keep_or_discard='keep'");
    $numbers = array();
    while ($row = mysql_fetch_assoc($result)) {
        $row = 'SUBSTR(colm,' . $row . ',1)';
        $numbers = $row;
    }

    //implode the array so you get one long string with all the substrings
    //eg. 'SUBSTR(colm,1,1),SUBSTR(colm,5,1),SUBSTR(colm,12,1)'
    $numbers = implode(",", $numbers);

    //pull the numbers you need and save them to an array.
    $result = mysql_query("SELECT " . $numbers . " FROM table");
    $concat = array();
    while ($row = mysql_fetch_assoc($result)) {
        $concat= $row;
    }

And there you have an array with the correct numbers.

I'm sorry if you can't/don't want to use PHP for this, I just don't really know how to do this without PHP, Perl, Python or some other similar language. Hopefully this solution will help somehow...

Upvotes: 0

Kevin Burton
Kevin Burton

Reputation: 11924

How about dynamic sql? (You will need to build the select part of the query)

CREATE PROCEDURE example_procedure()
BEGIN
  --
  --build the concat values here    
  --       
  SET @ids := '';
  SET @S = 'SELECT @ids := built_concat_of_values FROM table';
  PREPARE n_StrSQL FROM @S;
  EXECUTE n_StrSQL;
  DEALLOCATE PREPARE n_StrSQL;    
END

Upvotes: 0

Related Questions