Louis
Louis

Reputation: 65

Splitting of string by fixed keyword

Hi I currently have a tables with a column that I would like to split.

    ID Serial
    1  AAA"-A01-AU-234-U_xyz(CY)(REV-002)
    2  AAA"-A01-AU-234-U(CY)(REV-1)
    3  AAA"-A01-AU-234-U(CY)(REV-101)
    4  VVV"-01-AU-234-Z_ww(REV-001)
    5  VVV"-01-AU-234-Z(REV-001)_xyz(CY)
    6  V-VV"-01-AU-234-Z(REV-03)_xyz(CY)
    7  V-VV"-01-AU-234-Z-ZZZ(REV-004)_xyz(CY)

I would like to split this column into 2 field via a select statement

The first field would consist of the text from the start and end when this scenario is satisfied

  1. After the first "-
  2. take all text till the next 3 hypen (-)
  3. Take the first letter after the last hypen(-)

The second field would want to store the Value(Int) inside the (REV) bracket. Rev is always stored inside a compassing bracket (Rev-xxx) the number may stretch from 0-999 and have different form of representation

Example of output

    Field 1              Field 2
    AAA"-A01-AU-234-U    2
    AAA"-A01-AU-234-U    1
    AAA"-A01-AU-234-U    101
    VVV"-01-AU-234-Z     1
    VVV"-01-AU-234-Z     1
    V-VV"-01-AU-234-Z    3
    V-VV"-01-AU-234-Z    4

Upvotes: 1

Views: 98

Answers (4)

tctham
tctham

Reputation: 213

I have updated my answer. Is this better now?

DECLARE @Table table(ID int, SERIAL nvarchar(100));
INSERT INTO @Table(ID, SERIAL)
    VALUES  ('1', 'AAA"-A01-AU-234-U_xyz(CY)(REV-002)'),
            ('2', 'AAA"-A01-AU-234-U(CY)(REV-1)'),
            ('3', 'AAA"-A01-AU-234-U(CY)(REV-101)'),
            ('4', 'VVV"-01-AU-234-Z_ww(REV-001)'),
            ('5', 'VVV"-01-AU-234-Z(REV-001)_xyz(CY)'),
            ('6', 'VVV"-01-AU-234-Z(REV-03)_xyz(CY)'),
            ('7', 'VVV"-01-AU-234-Z(REV-004)_xyz(CY)'),
            ('8', 'AAA"-A01-AU-234-U-1111-(REV-111)'),
            ('9', 'AAA"-A01-AU-234-U-111111-5555(CY)(REV-101)'),
            ('10', 'V-VV"-01-AU-234-Z-ZZZ(REV-004)_xyz(CY)')

SELECT 
    ID,
    SERIAL,
    LEFT(SERIAL, P5.Pos + 1) AS Field1,
    CONVERT(int, SUBSTRING(SERIAL, P6.Pos, CHARINDEX(')', RIGHT(SERIAL, LEN(SERIAL) - P6.Pos)))) AS Field2
FROM @Table
    CROSS APPLY (SELECT CHARINDEX('"-', SERIAL)) AS P1(Pos)
    CROSS APPLY (SELECT CHARINDEX('-', SERIAL, P1.Pos + 1)) AS P2(Pos)
    CROSS APPLY (SELECT CHARINDEX('-', SERIAL, P2.Pos + 1)) AS P3(Pos)
    CROSS APPLY (SELECT CHARINDEX('-', SERIAL, P3.Pos + 1)) AS P4(Pos)
    CROSS APPLY (SELECT CHARINDEX('-', SERIAL, P4.Pos + 1)) AS P5(Pos)
    CROSS APPLY (SELECT CHARINDEX('REV-', SERIAL, P5.Pos + 1) + 4) AS P6(Pos)

Upvotes: 1

Jamleck
Jamleck

Reputation: 1037

Try this solution. It uses a combination of charindex and the substring function.

 DECLARE @TempTable table
 (
     id int,
     serial nvarchar(255)
 )

 insert into @TempTable values (1, 'AAA"-A01-AU-234-U_xyz(CY)(REV-002)')
 insert into @TempTable values (2, 'AAA"-A01-AU-234-U(CY)(REV-1)')
 insert into @TempTable values (3, 'AAA"-A01-AU-234-U(CY)(REV-101)')
 insert into @TempTable values (4, 'VVV"-01-AU-234-Z_ww(REV-001)')
 insert into @TempTable values (5, 'VVV"-01-AU-234-Z(REV-001)_xyz(CY)')
 insert into @TempTable values (6, 'VVV"-01-AU-234-Z(REV-03)_xyz(CY)')
 insert into @TempTable values (7, 'VVV"-01-AU-234-Z(REV-004)_xyz(CY)')

 select 
   id, 
   serial,
   substring(serial, 1, P4.Pos+1) as field1, 
   convert(int, substring(Serial, P6.Pos , P7.Pos - P6.Pos)) as field2
 from @TempTable
   cross apply (select (charindex('-', Serial))) as P1(Pos)
   cross apply (select (charindex('-', Serial, P1.Pos+1))) as P2(Pos)
   cross apply (select (charindex('-', Serial, P2.Pos+1))) as P3(Pos)
   cross apply (select (charindex('-', Serial, P3.Pos+1))) as P4(Pos)
   cross apply (select (charindex('REV-', Serial,P1.Pos+1)+4)) as P6(Pos)
   --+4 because 'REV-' is 4 chars long
   cross apply (select (charindex(')', Serial,P6.Pos+1))) as P7(Pos);

Upvotes: 1

user1613360
user1613360

Reputation: 1314

I came up with a solution in php using regular expressions.I am trying to convert it into posix standards supported by mysql.Anyways in the meanwhile you can have a look at this and it works perfect. /The first script select the values for fields 1 namely AAA"-A01-AU-234-U/

<?php


  $txt='VVV"-01-AU-234-Z(REV-001)_xyz(CY)';

  $re1='((?:[a-z][a-z0-9_]*))'; 
    $re2='.*?'; 
  $re3='(\\d+)';    
  $re4='.*?';   
  $re5='((?:[a-z][a-z0-9_]*))'; 
  $re6='.*?';   
  $re7='(\\d+)';    
  $re8='.*?';   
  $re9='([a-z])';   
echo $re1.$re2.$re3.$re4.$re5.$re6.$re7.$re8.$re9;
  if ($c=preg_match_all ("/".$re1.$re2.$re3.$re4.$re5.$re6.$re7.$re8.$re9."/is", $txt, $matches))
  {
      $var1=$matches[1][0];
      $int1=$matches[2][0];
      $var2=$matches[3][0];
      $int2=$matches[4][0];
      $w1=$matches[5][0];
      print "($var1) ($int1) ($var2) ($int2) ($w1) \n";
  }


?>

 /*The second script selects values for field 2 namely the last integer*/
<?php


  $txt='VVV"-01-AU-234-Z_ww(REV-001)';

  $re1='.*?';   
  $re2='\\d';   
  $re3='.*?';   
  $re4='\\d';   
  $re5='.*?';   
  $re6='\\d';   
  $re7='.*?';   
  $re8='\\d';   
  $re9='.*?';   
  $re10='\\d';  
  $re11='.*?';  
  $re12='\\d';  
  $re13='.*?';  
    $re14='\\d';    
  $re15='(\\d)';    

  if ($c=preg_match_all ("/".$re1.$re2.$re3.$re4.$re5.$re6.$re7.$re8.$re9.$re10.$re11.$re12.$re13.$re14.$re15."/is", $txt, $matches))
  {
      $d1=$matches[1][0];
      print "($d1) \n";
  }


?>




OUTPUT:

(VVV) (01) (AU) (234) (Z) //script 1
(1) //script 2

You can add database connection to the script and store the results in a new table.You can aslo iterate each row as input to the script and store corresponding results in the table.

Note:

The regular expression used for selecting field 1:

((?:[a-z][a-z0-9_]*)).*?(\d+).*?((?:[a-z][a-z0-9_]*)).*?(\d+).*?([a-z])

The regular expression used for selecting field 2:

 .*?\d.*?\d.*?\d.*?\d.*?\d.*?\d.*?\d(\d)

If anybody can convert the above expressions to posix standards then the user can write a simple query like

select t.serial as field 1 from table t 
where t.serial regexp 'converted exp' join 
(select t1.serial as field 2 from table t1
 where t1.serial regexp 'converted exp')q 
on q.id=t.id;

I tried to convert it but the matching constraints were lost.You should actually change ?: to ^ and ? to [^>] and //d to [0-9] or digit.Hope it helps.

Upvotes: 1

sdrzymala
sdrzymala

Reputation: 387

Maybe it is possible to make it better and faster, but at least it does work. If i will have some time more i will look at this again to think of better solution, but it do the job.

create table #t 
(
    id int,
    serial nvarchar(255)
)

go

insert into #t values (1,  'AAA"-A01-AU-234-U_xyz(CY)(REV-002)')
insert into #t values (2,  'AAA"-A01-AU-234-U(CY)(REV-1)')
insert into #t values (3, 'AAA"-A01-AU-234-U(CY)(REV-101)')
insert into #t values (4, 'VVV"-01-AU-234-Z_ww(REV-001)')
insert into #t values (5,  'VVV"-01-AU-234-Z(REV-001)_xyz(CY)')
insert into #t values (6,  'VVV"-01-AU-234-Z(REV-03)_xyz(CY)')
insert into #t values (7,  'VVV"-01-AU-234-Z(REV-004)_xyz(CY)')

go

select id, serial,
left(serial,charindex('-', serial, charindex('-', serial, charindex('-', serial, charindex('"',serial) + 2) +1) + 1) + 1) as 'Field2'
,cast( replace(left(right(serial, len(serial) - charindex('REV',serial) +1 ), CHARINDEX(')',right(serial, len(serial) - charindex('REV',serial) +1 )) - 1), 'REV-', '')as int) as 'Field1' 
from #t

go

gives me:

id  serial  Field2  Field1
1   AAA"-A01-AU-234-U_xyz(CY)(REV-002)  AAA"-A01-AU-234-U   2
2   AAA"-A01-AU-234-U(CY)(REV-1)    AAA"-A01-AU-234-U   1
3   AAA"-A01-AU-234-U(CY)(REV-101)  AAA"-A01-AU-234-U   101
4   VVV"-01-AU-234-Z_ww(REV-001)    VVV"-01-AU-234-Z    1
5   VVV"-01-AU-234-Z(REV-001)_xyz(CY)   VVV"-01-AU-234-Z    1
6   VVV"-01-AU-234-Z(REV-03)_xyz(CY)    VVV"-01-AU-234-Z    3
7   VVV"-01-AU-234-Z(REV-004)_xyz(CY)   VVV"-01-AU-234-Z    4

Upvotes: 2

Related Questions