Bunty
Bunty

Reputation: 1

Split a string in a column to multiple columns using ssis

I have a column in sql server table in which sever name is the data. I need to split the server name in to two columns. Host name goes in to one column and instance name goes in to another column. Any help would be appreciated.

Would like to use ssis for this purpose.

example :-

Input                      Expected Output

column1                     Column2            Column3

ServerName                  Hostname           InstanceName

wsql1005x\Express           wsql1005x           Express

Upvotes: 0

Views: 2832

Answers (1)

billinkc
billinkc

Reputation: 61201

There are two routes available to you. You can use a script component or make use of the Derived Column task and the expressions there. This solution uses that approach.

You could do all of this logic in a single derived column transformation but I would not wish that maintenance nightmare on my worst of enemies. Instead, make liberal use of the Derived Columns so that you have a fighting chance of ever debugging this.

This solution does not handle splitting a potentially null column. As the source data should not allow it, I didn't want to further complicate the solution. This approach does handle the default, unnamed, instance.

nvarchar(max)

You've chosen to store your server name and instance in an nvarchar(max) field. Unless you have a reason for needing 4000+ characters, not only do you waste space and foul up the optimizer, you make it a pain for SSIS.

SSIS derives its power from operating on data in memory. It allocates memory based on the maximum size for a given column. Except for large object types, LOB, which are identified by the data types DT_TEXT, DT_NTEXT, DT_IMAGE. Those the engine cannot allocate sufficient memory for and so carries a pointer in memory to a file on disk that represents this data. Disk is slow so if you're trying to get good performance, you now know why you aren't getting it.

The trick then is to get your data out of a LOB type and into something reasonable. What's the maximum length for a computer name? Well, the fully qualified domain name is capped at 255 bytes while an instance name is limited to 16. No need to go to maths.stackexchange.com to see that you've overshot your maximum length by a factor of 10.

I have always been a fan of making my source system do the work, so in my original extract query

SELECT 'wsql1005x\Express' AS ServerName
UNION ALL
SELECT 'localhost'
UNION ALL
SELECT @@ServerName;

I would write it with an explicit cast to my maximum possible length

SELECT
    (D.ServerName AS nvarchar(542)) AS ServerName
FROM
(
    SELECT 'wsql1005x\Express' AS ServerName
    UNION ALL
    SELECT 'localhost'
    UNION ALL
    SELECT @@ServerName
) D(ServerName);

Some helpful SO questions covering the topic

On with the show

DER Find TokenPosition

The purpose of this component is to find where our token is. As the \ character is also the escape character, we must use a double backslash where we really only mean one.

In English, this expression looks to see whether a backslash exists in column ServerName. If it does, then we will store the position in a new column called TokenLocation. If it does not exist, then we will indicate the token is the final position of the string. This will allow us to slice off the server name if there is no instance attached.

TokenLocation

FINDSTRING(ServerName,"\\",1) > 0 ? FINDSTRING(ServerName,"\\",1) : LEN(ServerName)

DER EndOfFirstWord

Here we will determine the end boundary of the first word and the start boundary of the second word. If the value of the token location matches the total length of the string, then we know there is no backslash token in our string. We then look one character position to the left and the right of token to identify our stop and start boundaries. For words with no tokens, we will set both of these positions to the end of the word.

EndOfFirstWord

(TokenLocation != LEN(ServerName)) ? TokenLocation - 1 : LEN(ServerName)

BeginOfSecondWord

(TokenLocation != LEN(ServerName)) ? TokenLocation + 1 : LEN(ServerName)

DER Generate New Columns

At this point, we know

  • where our token is TokenLocation
  • where the first word begins (position 1)
  • where the first word ends (EndOfFirstWord)
  • where the second word begins (BeginOfSecondWord)
  • where the second word ends (LEN(TokenLocation))

so it simply becomes a matter of slicing up our column.

HostName

SUBSTRING(ServerName,1,EndOfFirstWord)

InstanceName

SUBSTRING(ServerName,BeginOfSecondWord,LEN(ServerName) - TokenLocation)

enter image description here

Upvotes: 1

Related Questions