Reputation: 41
How To Remove Unwanted Characters in SSIS between text
i.e. we have data like this
2134;#Adam Connor (aconnor),21987;#Tatanka Wabe (Twabe);#
when it is sourced from sharepoint. I tried substrings, replace etc but was not able to remove the numbers in between the names.
I want the output as
Adam Connor, Tatanka Kale
Upvotes: 4
Views: 5008
Reputation: 81950
If the sample data represents the pattern, and you are open to a Table-Valued Function.
Some time ago, tired of extracting strings (left, right, substring, charindex, patindex, etc.), I modified a parse funtion to accept two non-like parameters. In this case a # and (
Example
Declare @YourTable table (ID int,SomeCol varchar(max))
Insert Into @YourTable values
(1,'2134;#Adam Connor (aconnor),21987;#Tatanka Wabe (Twabe);#')
Select A.ID
,B.*
From @YourTable A
Cross Apply (
Select NewVal = Stuff((Select ', ' +ltrim(rtrim(RetVal))
From [dbo].[tvf-Str-Extract](A.SomeCol,'#','(')
For XML Path ('')
),1,2,'')
) B
Returns
ID NewVal
1 Adam Connor, Tatanka Wabe
The Function if Interested
CREATE FUNCTION [dbo].[tvf-Str-Extract] (@String varchar(max),@Delimiter1 varchar(100),@Delimiter2 varchar(100))
Returns Table
As
Return (
with cte1(N) As (Select 1 From (Values(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) N(N)),
cte2(N) As (Select Top (IsNull(DataLength(@String),0)) Row_Number() over (Order By (Select NULL)) From (Select N=1 From cte1 N1,cte1 N2,cte1 N3,cte1 N4,cte1 N5,cte1 N6) A ),
cte3(N) As (Select 1 Union All Select t.N+DataLength(@Delimiter1) From cte2 t Where Substring(@String,t.N,DataLength(@Delimiter1)) = @Delimiter1),
cte4(N,L) As (Select S.N,IsNull(NullIf(CharIndex(@Delimiter1,@String,s.N),0)-S.N,8000) From cte3 S)
Select RetSeq = Row_Number() over (Order By N)
,RetPos = N
,RetVal = left(RetVal,charindex(@Delimiter2,RetVal)-1)
From (
Select *,RetVal = Substring(@String, N, L)
From cte4
) A
Where charindex(@Delimiter2,RetVal)>1
)
/*
Max Length of String 1MM characters
Declare @String varchar(max) = 'Dear [[FirstName]] [[LastName]], ...'
Select * From [dbo].[tvf-Str-Extract] (@String,'[[',']]')
*/
Note:
If you were to simply run
Declare @YourTable table (ID int,SomeCol varchar(max))
Insert Into @YourTable values
(1,'2134;#Adam Connor (aconnor),21987;#Tatanka Wabe (Twabe);#')
Select A.ID
,B.*
From @YourTable A
Cross Apply [dbo].[tvf-Str-Extract](A.SomeCol,'#','(') B
You would get
ID RetSeq RetPos RetVal
1 1 7 Adam Connor
1 2 36 Tatanka Wabe
Upvotes: 1
Reputation: 37313
Note: Code in VB.NET
You need to extract the strings between #
and (
Dim mc As MatchCollection = Regex.Matches(strContent, "(?<=\#)(.*?)(?=\()", RegexOptions.Singleline)
Then you need to Join them separated with comma
String.Join(",", mc.Cast(Of Match)().Select(Function(m) m.Value))
You can use a Script component to achieve this using Regular Expression:
Assuming that Column0
is the Input Column and outColumn
is the Output Column
Imports System
Imports System.Data
Imports System.Math
Imports Microsoft.SqlServer.Dts.Pipeline.Wrapper
Imports Microsoft.SqlServer.Dts.Runtime.Wrapper
Imports System.Text.RegularExpressions
<Microsoft.SqlServer.Dts.Pipeline.SSISScriptComponentEntryPointAttribute> _
<CLSCompliant(False)> _
Public Class ScriptMain
Inherits UserComponent
Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)
if Not Row.Column0_IsNull AndAlso _
Not String.IsNullOrEmpty(Row.Column0.Trim) Then
Dim strContent As String = Row.Column0
Dim mc As MatchCollection = Regex.Matches(strContent, "(?<=\#)(.*?)(?=\()", RegexOptions.Singleline)
Row.OutColumn = String.Join(",", mc.Cast(Of Match)().Select(Function(m) m.Value))
Else
Row.OutColumn_IsNull = True
End If
End Sub
End Class
Upvotes: 1