Reputation: 33
I've been working on treating a sql table, and splitting the data. I've come to splitting some initials from the last name. The only problem is, the initials are spaced out. For example (data from my table)
Hanse J S P
> J S P
are the initialsGerson B D V
> B D V
are the initialsJ D Timberland
> J D
are the initialsSo basically, it's up to four initials, that can be either at the begin, middle, or end of the string. I'm at a loss as to how I should import these. into a seperate column where the result will be:
COL A | COL B
J S P | Jansen
B D V | Gerson
J D | Timberland
Can anyone please point me in the right direction? I'm using SQL Server.
Upvotes: 3
Views: 1424
Reputation: 272066
This one uses CHARINDEX
and recursive CTE to extract space delimited substrings from name:
Once you have the substrings, it is only a matter of gluing them back:
WITH yourdata(FullName) AS (
SELECT 'Hanse J S P' UNION
SELECT 'Gerson B D V' UNION
SELECT 'J D Timberland' UNION
SELECT 'TEST 1 TEST 2 TEST 3'
), cte AS (
SELECT
FullName,
CASE WHEN Pos1 = 0 THEN FullName ELSE SUBSTRING(FullName, 1, Pos1 - 1) END AS LeftPart,
CASE WHEN Pos1 = 0 THEN Null ELSE SUBSTRING(FullName, Pos1 + 1, Pos2 - Pos1) END AS NextPart,
1 AS PartSort
FROM yourdata
CROSS APPLY (SELECT CHARINDEX(' ', FullName) AS Pos1, LEN(FullName) AS Pos2) AS CA
UNION ALL
SELECT
FullName,
CASE WHEN Pos1 = 0 THEN NextPart ELSE SUBSTRING(NextPart, 1, Pos1 - 1) END,
CASE WHEN Pos1 = 0 THEN Null ELSE SUBSTRING(NextPart, Pos1 + 1, Pos2 - Pos1) END,
PartSort + 1
FROM cte
CROSS APPLY (SELECT CHARINDEX(' ', NextPart) AS Pos1, LEN(NextPart) AS Pos2) AS CA
WHERE NextPart IS NOT NULL
)
SELECT yourdata.FullName, STUFF(CA1.XMLStr, 1, 1, '') AS Initials, STUFF(CA2.XMLStr, 1, 1, '') AS Names
FROM yourdata
CROSS APPLY (
SELECT CONCAT(' ', LeftPart)
FROM cte
WHERE FullName = yourdata.FullName AND LEN(LeftPart) = 1
ORDER BY PartSort
FOR XML PATH('')
) AS CA1(XMLStr)
CROSS APPLY (
SELECT CONCAT(' ', LeftPart)
FROM cte
WHERE FullName = yourdata.FullName AND LEN(LeftPart) > 1
ORDER BY PartSort
FOR XML PATH('')
) AS CA2(XMLStr)
Result:
| FullName | Initials | Names |
|----------------------|----------|----------------|
| Gerson@B@D@V | B D V | Gerson |
| Hanse@J@S@P | J S P | Hanse |
| J@D@Timberland | J D | Timberland |
| TEST@1@TEST@2@TEST@3 | 1 2 3 | TEST TEST TEST |
Upvotes: 1
Reputation: 2017
I used some built-in functions for this. The general idea is to use string_split
to split the string into rows, use ROW_NUMBER
to save the order according to length and the char(s) position in the string, then use FOR XML PATH()
to concatenate from rows to a single column.
--Assume your data structure
DECLARE @temp TABLE (thestring varchar(1000))
INSERT INTO @temp VALUES
('Hanse J S P'), ('Gerson B D V'), ('J D Timberland')
;WITH CTE AS
(
SELECT *
,ROW_NUMBER() OVER (PARTITION BY thestring ORDER BY thestring, LEN(value) ASC, pos ASC) [order]
FROM (
SELECT *
, value AS [theval]
, CHARINDEX(CASE WHEN len(value) = 1 THEN ' ' + value ELSE value END, thestring) AS [pos]
FROM @temp CROSS APPLY string_split(thestring, ' ')
) AS dT
)
SELECT ( SELECT value + ' ' AS [text()]
FROM cte
WHERE cte.thestring = T.thestring
AND LEN(theval) = 1
FOR XML PATH('')
) AS [COL A]
,( SELECT value + ' ' AS [text()]
FROM cte
WHERE cte.thestring = T.thestring
AND LEN(theval) > 1
FOR XML PATH('')
) AS [COL B]
FROM @temp T
GROUP BY thestring
Produces output:
COL A COL B
----- -----
B D V Gerson
J S P Hanse
J D Timberland
Upvotes: 1
Reputation: 81930
Similar to JNevil's answer (+1), but not limited to 4 tokens.
Example
Declare @YourTable table (SomeCol varchar(50))
Insert Into @YourTable values
('Hanse J S P')
,('Gerson B D V')
,('J D Timberland')
,('J D Timberland / J R R Tolkien')
Select A.SomeCol
,ColA = ltrim(
concat(IIF(len(Pos1)=1,' '+Pos1,null)
,IIF(len(Pos2)=1,' '+Pos2,null)
,IIF(len(Pos3)=1,' '+Pos3,null)
,IIF(len(Pos4)=1,' '+Pos4,null)
,IIF(len(Pos5)=1,' '+Pos5,null)
,IIF(len(Pos6)=1,' '+Pos6,null)
,IIF(len(Pos7)=1,' '+Pos7,null)
,IIF(len(Pos8)=1,' '+Pos8,null)
,IIF(len(Pos9)=1,' '+Pos9,null)
)
)
,ColB = ltrim(
concat(IIF(Pos1 not Like '[a-z]',' '+Pos1,null)
,IIF(Pos2 not Like '[a-z]',' '+Pos2,null)
,IIF(Pos3 not Like '[a-z]',' '+Pos3,null)
,IIF(Pos4 not Like '[a-z]',' '+Pos4,null)
,IIF(Pos5 not Like '[a-z]',' '+Pos5,null)
,IIF(Pos6 not Like '[a-z]',' '+Pos6,null)
,IIF(Pos7 not Like '[a-z]',' '+Pos7,null)
,IIF(Pos8 not Like '[a-z]',' '+Pos8,null)
,IIF(Pos9 not Like '[a-z]',' '+Pos9,null)
)
)
From @YourTable A
Cross Apply (
Select Pos1 = xDim.value('/x[1]','varchar(max)')
,Pos2 = xDim.value('/x[2]','varchar(max)')
,Pos3 = xDim.value('/x[3]','varchar(max)')
,Pos4 = xDim.value('/x[4]','varchar(max)')
,Pos5 = xDim.value('/x[5]','varchar(max)')
,Pos6 = xDim.value('/x[6]','varchar(max)')
,Pos7 = xDim.value('/x[7]','varchar(max)')
,Pos8 = xDim.value('/x[8]','varchar(max)')
,Pos9 = xDim.value('/x[9]','varchar(max)')
From (Select Cast('<x>' + replace(SomeCol,' ','</x><x>')+'</x>' as xml) as xDim) as A
) B
Returns
SomeCol ColA ColB
Hanse J S P J S P Hanse
Gerson B D V B D V Gerson
J D Timberland J D Timberland
J D Timberland / J R R Tolkien J D / J R R Timberland / Tolkien
Upvotes: 1
Reputation: 50034
Here's a rather hamfisted way of doing it by abusing the Parsename
function. The big caveat here is that Parsename
is limited to 4 tokens so J S P Jansen
will work but J S P C Jansen
or John J S P Jansen
will not.
With parsedname AS
(
SELECT
PARSENAME(replace(name, ' ', '.'), 1) name1,
PARSENAME(replace(name, ' ', '.'), 2) name2,
PARSENAME(replace(name, ' ', '.'), 3) name3,
PARSENAME(replace(name, ' ', '.'), 4) name4
FROM yourtable
)
SELECT
CASE WHEN LEN(name4) = 1 THEN name4 ELSE '' END +
CASE WHEN LEN(name3) = 1 THEN name3 ELSE '' END +
CASE WHEN LEN(name2) = 1 THEN name2 ELSE '' END +
CASE WHEN LEN(name1) = 1 THEN name1 ELSE '' END as initials,
CASE WHEN LEN(name1) > 1 THEN name1
WHEN LEN(name2) > 1 THEN name2
WHEN LEN(name3) > 1 THEN name3
WHEN LEN(name4) > 1 THEN name4
END as surname
FROM parsedname
Here is a sqlfiddle of this in action
CREATE TABLE NAMES (name varchar(50));
INSERT INTO NAMES VALUES ('J S P Jansen');
INSERT INTO NAMES VALUES ('B D V Gerson');
INSERT INTO NAMES VALUES ('J D Timberland');
With parsedname AS
(
SELECT
PARSENAME(replace(name, ' ', '.'), 1) name1,
PARSENAME(replace(name, ' ', '.'), 2) name2,
PARSENAME(replace(name, ' ', '.'), 3) name3,
PARSENAME(replace(name, ' ', '.'), 4) name4
FROM names
)
SELECT
CASE WHEN LEN(name4) = 1 THEN name4 ELSE '' END +
CASE WHEN LEN(name3) = 1 THEN name3 ELSE '' END +
CASE WHEN LEN(name2) = 1 THEN name2 ELSE '' END +
CASE WHEN LEN(name1) = 1 THEN name1 ELSE '' END as initials,
CASE WHEN LEN(name1) > 1 THEN name1
WHEN LEN(name2) > 1 THEN name2
WHEN LEN(name3) > 1 THEN name3
WHEN LEN(name4) > 1 THEN name4
END as surname
FROM parsedname
+----------+------------+
| initials | surname |
+----------+------------+
| JSP | Jansen |
| BDV | Gerson |
| JD | Timberland |
+----------+------------+
If a space is needed in between those letters you can just flip around that CASE statement to something like:
TRIM(CASE WHEN LEN(name4) = 1 THEN name4 + ' ' ELSE '' END +
CASE WHEN LEN(name3) = 1 THEN name3 + ' ' ELSE '' END +
CASE WHEN LEN(name2) = 1 THEN name2 + ' ' ELSE '' END +
CASE WHEN LEN(name1) = 1 THEN name1 + ' ' ELSE '' END) as initials
+----------+------------+
| initials | surname |
+----------+------------+
| J S P | Jansen |
| B D V | Gerson |
| J D | Timberland |
+----------+------------+
Upvotes: 2
Reputation: 407
Which version of SQL Server do you have? Is STRING_SPLIT() available?
If yes, split using the space as a delimiter, iterate through the resulting strings, evaluate their length and concatenate a result string with the string when said string is one character in length and is a letter.
Add a space before unless the result string is so far empty.
If STRING_SPLIT() is not available... Well... Here are a few solutions:
T-SQL split string based on delimiter
-- Addendum
To your second part of the question (which did not originally exist when I originally posted my reply) where you would like to isolate the non-initials part into a second column, I would basically separate two blocks of logic with two result strings based on the length of each element.
Note: this is not going to be very elegant in pre-2016 SQL Server and may even require a CURSOR (sigh)
I know I am going to be downvoted for mentioning a cursor.
Upvotes: 0