Ragda
Ragda

Reputation: 107

how to split a string to 4 characters sub strings then concat them by hyphen?

I need to write a mysql query that does the following:

I have list of serial numbers in a database table, the query should only update serial number rows where the length of a serial number is more than 15, then each serial number is divided to 4-digits sub-strings separated by a hyphen. example :

'10028641232912356' should become '1002-8641-1232-9123-56'

I started with this, which only inserts a hyphen after first 4 :

SELECT serial_no, CHAR_LENGTH(TRIM(serial_no)) as 'length', CONCAT_WS('-',SUBSTRING(TRIM(serial_no),1,4), SUBSTRING(TRIM(serial_no),5,4)) as result FROM pos where serial_no is not null and CHAR_LENGTH(TRIM(serial_no))>=15 ;

this is only the select statement, at first I just want to get (as in select) the new format of the serial no then i'll update it, but since I dont know the exact length of each serial number I need to figure out this part:

CONCAT_WS('-',SUBSTRING(TRIM(serial_no),1,4)

This is must only be done using mysql functions

Any help is appreciated

Upvotes: 2

Views: 1103

Answers (2)

Kickstart
Kickstart

Reputation: 21513

Making an assumption that the serial number is unique then the following should do it:-

SELECT serial_no, GROUP_CONCAT(SUBSTRING(serial_no, anInt, 4) ORDER BY anInt SEPARATOR '-')
FROM
(
    SELECT serial_no, anInt
    FROM pos
    CROSS JOIN
    (
        SELECT (4 * (units.i + 10 * tens.i + 100 * hundreds.i)) + 1 AS anInt
        FROM (SELECT 0 i UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) units
        CROSS JOIN (SELECT 0 i UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) tens
        CROSS JOIN (SELECT 0 i UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) hundreds
    ) sub1
    WHERE LENGTH(serial_no) >= anInt
    AND LENGTH(serial_no) > 15
) sub2
GROUP BY serial_no

This will cope with a serial number up to ~4000 characters long.

This works by using a series of unioned fixed queries to get the numbers 0 to 9. This is cross joined against itself 3 times with the first being units, the next being tens and the last being hundreds (you can carry on adding more if you want). From these the numbers between 0 and 999 are generated, then multiplied by 4 and with 1 added (so giving 1 to 3997 in steps of 4), which is the starting position of each group of 4. The WHERE clause checks that this generated number is less than the length of serial_no (if it is you land up with duplicates), and that serial_no is longer than 15.

This will generate a list of all the numbers, each one repeated as many times as their are groups of 4 numbers (or partial groups), along with the start position of a group.

The outer SELECT then takes this list and uses substring to extract each group, and uses GROUP_CONCAT to join he results together again with '-' as the separator between each group. If also specifies the start position of each group as the order to join them again (would probably be fine without this, but I wouldn't guarantee it).

SQL fiddle here:-

http://www.sqlfiddle.com/#!2/eb2d0/2

Upvotes: 1

Alma Do
Alma Do

Reputation: 37365

Introduction

Since we've figured out that your strings can have any length; your issue boils down to how to split strings with any length in to it's chunks of some length in MySQL. Unfortunately there is a serious problem in MySQL that prevent us from doing this in "native" way. And this problem is - MySQL does not support sequences. That means you can not use some sort of internal iterator like construct to loop over your string.


But there is always a way

Building sequence

We can use a trick to do this. First part of trick: use CROSS JOIN to produce the desired row sets. If you are not aware how it works then I'll remind you. It will produce a Cartesian product of the two row sets. The second part of the trick: use a well known formula.

N = d1x101 + d2x102 + ...

Actually you can do that with any base, not just 10. For this demonstration I will use 10. Usage:

SELECT 
  n1.i+10*n2.i AS num
FROM 
  (SELECT 0 as i UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) as n1 
  CROSS JOIN 
  (SELECT 0 as i UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) as n2

So we're using UNION ALL for the numbers 0..9 to produce a multiplication part (our di). With the query above you'll get consecutive 0..99 numbers. This is because we're using base powers till 2 only, and 102=100. You can check fiddle to make sure that it will work properly.

Iterating through string

Now with these "pseudo-generator" we can emulate iteration through the string. To do that, there are MySQL variables that will be our iterator. Of course separation of string piece is a work for SUBSTR(). So basic skel is:

SELECT 
  SUBSTR(@str, @i:=@i+@len, @len) as chunk 
FROM 
  (SELECT 0 as i UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) as n1 
  CROSS JOIN 
  (SELECT 0 as i UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) as n2 
  CROSS JOIN 
  (SELECT @str:='ABCD1234EFGH5678IJKL90', @len:=4, @i:=1-@len) as init 
LIMIT 6;

(fiddle for sample above is here). We are just iterating through sequence and then using the iterator to create the correct offset. All that is left to do now is gather our string and insert the hyphens. Hopefully there is GROUP_CONCAT() in MySQL for that:

SELECT 
  GROUP_CONCAT(chunk SEPARATOR '-') AS string 
FROM 
  (SELECT 
    SUBSTR(@str, @i:=@i+@len, @len) as chunk 
  FROM 
    (SELECT 0 as i UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) as n1 
    CROSS JOIN 
    (SELECT 0 as i UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) as n2 
    CROSS JOIN 
    (SELECT @str:='ABCD1234EFGH5678IJKL90', @len:=4, @i:=1-@len) as init
  ) AS data 
WHERE chunk!='';

(again, fiddle is here).

With whole table

Now it is a sample and you want to select that from a table. It will be more complicated:

SELECT 
  serial_no, 
  GROUP_CONCAT(chunk SEPARATOR '-') AS serial 
FROM 
  (SELECT 
    SUBSTR(
      IF(@str=serial_no, @str, serial_no), 
      @i:=IF(@str=serial_no, @i+@len, 1), 
      @len
    ) AS chunk, 
    @str:=serial_no AS serial_no 
  FROM 
    (SELECT 
      serial_no 
    FROM 
      pos 
      CROSS JOIN 
        (SELECT 0 as i UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) as n1 
      CROSS JOIN 
        (SELECT 0 as i UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) as n2 
    ORDER BY 
      serial_no) AS data 
  CROSS JOIN 
    (SELECT @len:=4, @i:=1-@len) AS init) AS seq 
WHERE 
  chunk!='' 
GROUP BY 
  serial_no;

Fiddle is available here. There is another trick to produce the row sets. We'll use CROSS JOIN now instead of initializing string. So we should pass the current pos value of the serial_no. In order to make sure all the rows will be iterated properly we have to order them (that is done with inner ORDER BY).

Limitations

Well as you already know, sequence is limited with 99; thus, with our @len defined as 4 we'll be able to split only by a 4000 length string maximum. Also this will use the whole sequence in any case. Even if your string is much shorter (chances are it is). Thus performance impact may have place.


Is it worth the effort?

My point is: mostly, no. It may be ok to use it once, maybe. But it won't be re-usable and it won't be readable. Thus there is little sense in doing such string operations with DBMS functions because we have applications for such things. It should be used for that and using it you actually can create re-usable/scalable/or/whatever code.

Another way may be to create stored procedure in which we do the desired thing (so, split the string by given length & concatenate it with given delimiter). But honestly, it's just an attempt to "hide the problem". Because even if it would be re-usable, it still will have the same weakness; performance impact. Even more so if we're going to create code for the DBMS. Then again, why don't we place that code in the application? In 99% of cases DBMS is the place for data storage and the application is the place for code (e.g. logic). Mixing these two things almost always ends in a bad result.

Upvotes: 4

Related Questions