Randy B.
Randy B.

Reputation: 453

Splitting a String by character and parsing it into multiple columns in another table

I am looking to take a string of directory path and parse information out of it into existing columns on another table. This is for the purpose of creating a staging table for reporting. It will be parsing many directory paths if the ProjectName is applicable to the change in structure.

Data Example:

Table1_Column1
ProjectName\123456_ProjectShortName\Release_1\Iteration\etc


Expected Output:

Table2_Column1, Table2_Column2
123456          ProjectShortName

I've figured out how to parse some strings by character but it seems a bit clunky and inefficient. Is there a better structure to go about this? To add some more to it, this is just one column I need to manipulate before shifting it over there are three other columns that are being directly shifted to the staging table based on the ProjectName.

Is it better to just create a UDF to split then call it within the job that will move the data or is there another way?

Upvotes: 0

Views: 91

Answers (2)

LukStorms
LukStorms

Reputation: 29647

Here's a method without a UDF.
It uses charindex and substring to get the parts from that path string.

An example using a table variable:

declare @T table (Table1_Column1 varchar(100));
insert into @T values 
('ProjectName\123456_ProjectShortName\Release_1\Iteration\etc'),
('OtherProjectName\789012_OtherProjectShortName\Release_2\Iteration\xxx');

select
case 
when FirstBackslashPos > 0 and FirstUnderscorePos > 0 
then substring(Col1,FirstBackslashPos+1,FirstUnderscorePos-FirstBackslashPos-1) 
end as Table1_Column1,
case 
when FirstUnderscorePos > 0 and SecondBackslashPos > 0 
then substring(Col1,FirstUnderscorePos+1,SecondBackslashPos-FirstUnderscorePos-1) 
end as Table1_Column2
from (
  select 
  Table1_Column1 as Col1, 
  charindex('\',Table1_Column1) as FirstBackslashPos,
  charindex('_',Table1_Column1) as FirstUnderscorePos,
  charindex('\',Table1_Column1,charindex('\',Table1_Column1)+1) as SecondBackslashPos
from @T
) q;

If you want to calculate only one into a variable

declare @ProjectPath varchar(100);
set @ProjectPath = 'ProjectName\123456_ProjectShortName\Release_1\Iteration\etc';

declare @FirstBackslashPos int  = charindex('\',@ProjectPath);
declare @FirstUnderscorePos int = charindex('_',@ProjectPath,@FirstBackslashPos);
declare @SecondBackslashPos int = charindex('\',@ProjectPath,@FirstBackslashPos+1);
declare @ProjectNumber varchar(30) = case when @FirstBackslashPos > 0 and @FirstUnderscorePos > 0 then substring(@ProjectPath,@FirstBackslashPos+1,@FirstUnderscorePos-@FirstBackslashPos-1)end;
declare @ProjectShortName varchar(30) = case when @FirstUnderscorePos > 0 and @SecondBackslashPos > 0 then substring(@ProjectPath,@FirstUnderscorePos+1,@SecondBackslashPos-@FirstUnderscorePos-1) end;

select @ProjectNumber as ProjectNumber, @ProjectShortName as ProjectShortName;

But i.m.h.o. it might be worth the effort to add some CLR that brings true regex matching to the SQL server. Since CHARINDEX and PATINDEX are not as flexible as regex.

Upvotes: 1

John Cappelletti
John Cappelletti

Reputation: 81950

The following is a SUPER fast Parser but it is limited to 8K bytes. Notice the Returned Sequence Number... Perhaps you can key off of that because I am still not clear on the logic for why column1 is 123456 and not ProjectName

Declare @String varchar(max) = 'ProjectName\123456_ProjectShortName\Release_1\Iteration\etc'

Select * from [dbo].[udf-Str-Parse-8K](@String,'\')

Returns

RetSeq  RetVal
1       ProjectName
2       123456_ProjectShortName
3       Release_1
4       Iteration
5       etc

The UDF if needed

CREATE FUNCTION [dbo].[udf-Str-Parse-8K] (@String varchar(max),@Delimiter varchar(10))
Returns Table 
As
Return (  
    with   cte1(N)   As (Select 1 From (Values(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) N(N)),
           cte2(N)   As (Select Top (IsNull(DataLength(@String),0)) Row_Number() over (Order By (Select NULL)) From (Select N=1 From cte1 a,cte1 b,cte1 c,cte1 d) A ),
           cte3(N)   As (Select 1 Union All Select t.N+DataLength(@Delimiter) From cte2 t Where Substring(@String,t.N,DataLength(@Delimiter)) = @Delimiter),
           cte4(N,L) As (Select S.N,IsNull(NullIf(CharIndex(@Delimiter,@String,s.N),0)-S.N,8000) From cte3 S)

    Select RetSeq = Row_Number() over (Order By A.N)
          ,RetVal = Substring(@String, A.N, A.L) 
    From   cte4 A
);
--Much faster than str-Parse, but limited to 8K
--Select * from [dbo].[udf-Str-Parse-8K]('Dog,Cat,House,Car',',')
--Select * from [dbo].[udf-Str-Parse-8K]('John||Cappelletti||was||here','||')

Upvotes: 0

Related Questions