Anil Kumar
Anil Kumar

Reputation: 41

SSIS remove unwanted characters

How To Remove Unwanted Characters in SSIS between text

i.e. we have data like this

2134;#Adam Connor (aconnor),21987;#Tatanka Wabe (Twabe);# 

when it is sourced from sharepoint. I tried substrings, replace etc but was not able to remove the numbers in between the names.

I want the output as

Adam Connor, Tatanka Kale

Upvotes: 4

Views: 5008

Answers (2)

John Cappelletti
John Cappelletti

Reputation: 81950

If the sample data represents the pattern, and you are open to a Table-Valued Function.

Some time ago, tired of extracting strings (left, right, substring, charindex, patindex, etc.), I modified a parse funtion to accept two non-like parameters. In this case a # and (

Example

Declare @YourTable table (ID int,SomeCol varchar(max))
Insert Into @YourTable values
(1,'2134;#Adam Connor (aconnor),21987;#Tatanka Wabe (Twabe);#')

Select A.ID
      ,B.*
 From  @YourTable A
 Cross Apply (
                Select NewVal = Stuff((Select ', ' +ltrim(rtrim(RetVal)) 
                                         From [dbo].[tvf-Str-Extract](A.SomeCol,'#','(') 
                                         For XML Path ('')
                                      ),1,2,'')
             ) B

Returns

ID  NewVal
1   Adam Connor, Tatanka Wabe

The Function if Interested

CREATE FUNCTION [dbo].[tvf-Str-Extract] (@String varchar(max),@Delimiter1 varchar(100),@Delimiter2 varchar(100))
Returns Table 
As
Return (  

with   cte1(N)   As (Select 1 From (Values(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) N(N)),
       cte2(N)   As (Select Top (IsNull(DataLength(@String),0)) Row_Number() over (Order By (Select NULL)) From (Select N=1 From cte1 N1,cte1 N2,cte1 N3,cte1 N4,cte1 N5,cte1 N6) A ),
       cte3(N)   As (Select 1 Union All Select t.N+DataLength(@Delimiter1) From cte2 t Where Substring(@String,t.N,DataLength(@Delimiter1)) = @Delimiter1),
       cte4(N,L) As (Select S.N,IsNull(NullIf(CharIndex(@Delimiter1,@String,s.N),0)-S.N,8000) From cte3 S)

Select RetSeq = Row_Number() over (Order By N)
      ,RetPos = N
      ,RetVal = left(RetVal,charindex(@Delimiter2,RetVal)-1) 
 From  (
        Select *,RetVal = Substring(@String, N, L) 
         From  cte4
       ) A
 Where charindex(@Delimiter2,RetVal)>1

)
/*
Max Length of String 1MM characters

Declare @String varchar(max) = 'Dear [[FirstName]] [[LastName]], ...'
Select * From [dbo].[tvf-Str-Extract] (@String,'[[',']]')
*/

Note:

If you were to simply run

Declare @YourTable table (ID int,SomeCol varchar(max))
Insert Into @YourTable values
(1,'2134;#Adam Connor (aconnor),21987;#Tatanka Wabe (Twabe);#')

Select A.ID
      ,B.*
 From  @YourTable A
 Cross Apply [dbo].[tvf-Str-Extract](A.SomeCol,'#','(')  B

You would get

ID  RetSeq  RetPos  RetVal
1   1       7       Adam Connor 
1   2       36      Tatanka Wabe 

Upvotes: 1

Hadi
Hadi

Reputation: 37313

You can use Regular Expressions

Note: Code in VB.NET

You need to extract the strings between # and (

Dim mc As MatchCollection = Regex.Matches(strContent, "(?<=\#)(.*?)(?=\()", RegexOptions.Singleline)

Then you need to Join them separated with comma

String.Join(",", mc.Cast(Of Match)().Select(Function(m) m.Value))

SSIS Version - Using Script Component

You can use a Script component to achieve this using Regular Expression:

Assuming that Column0 is the Input Column and outColumn is the Output Column

Imports System  
Imports System.Data  
Imports System.Math  
Imports Microsoft.SqlServer.Dts.Pipeline.Wrapper  
Imports Microsoft.SqlServer.Dts.Runtime.Wrapper  
Imports System.Text.RegularExpressions

<Microsoft.SqlServer.Dts.Pipeline.SSISScriptComponentEntryPointAttribute> _  
<CLSCompliant(False)> _  
Public Class ScriptMain  
    Inherits UserComponent  

Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)  

        if Not Row.Column0_IsNull AndAlso _
           Not String.IsNullOrEmpty(Row.Column0.Trim) Then

            Dim strContent As String = Row.Column0

            Dim mc As MatchCollection = Regex.Matches(strContent, "(?<=\#)(.*?)(?=\()", RegexOptions.Singleline)

            Row.OutColumn = String.Join(",", mc.Cast(Of Match)().Select(Function(m) m.Value))

        Else 

            Row.OutColumn_IsNull = True

        End If

    End Sub  

End Class  

References

Upvotes: 1

Related Questions