Reputation: 37
I have a column called Physical-address
that I need to parse into it's own columns so that I need import it into separate address fields. The data in the column is like this:
Here are some examples:
1) 6453 W Church Dr, Summitville, CA, 46741, Brown
2) 2456 North Street, Greensboro, NC 27403
3) 5847 East Wood Dr, VA
4) 530 Alhambra Dr. Apt. A1 Anderson IN 46012
5) 1511 W 9th St, Anderson, IN, 46016
6) 2900 W 22nd St, Anderson, IN, 46013, Madison
7) Community Hospital, 1515 N Madison Ave, Anderson, IN, 46011
Result:
1) 46741
2) 27403
3) NULL
4) 46012
5) 46016
6) 46013
7) 46011
Thanks in advance!
Upvotes: 2
Views: 95
Reputation: 67291
You might try this (thx, gotqn for the sample data!)
DECLARE @DataSource TABLE
(
ID INT IDENTITY,
[value] NVARCHAR(MAX)
);
INSERT INTO @DataSource ([value])
VALUES ('6453 W Church Dr, Summitville, CA, 46741, Brown')
,('2456 North Street, Greensboro, NC 27403')
,('5847 East Wood Dr, VA')
,('530 Alhambra Dr. Apt. A1 Anderson IN 46012')
,('1511 W 9th St, Anderson, IN, 46016')
,('2900 W 22nd St, Anderson, IN, 46013, Madison')
,('Community Hospital, 1515 N Madison Ave, Anderson, IN, 46011');
--The query will find all integers in the strings and return those in the intervall needed
SELECT ID,[value]
,fragment.value('text()[1]','int')
FROM @DataSource ds
CROSS APPLY(SELECT CAST('<x>' + REPLACE(REPLACE((SELECT [value] AS [*] FOR XML PATH('')),',',' '),' ','</x><x>') + '</x>' AS XML)) A(Casted)
CROSS APPLY A.Casted.nodes('/x[not(empty(. cast as xs:int?))]') B(fragment)
WHERE fragment.value('text()[1]','int') BETWEEN 10000 AND 99999;
the first CROSS APPLY
will replace all commas to blanks and then split the string at all blanks (by transforming a b c
to <x>a</x><x>b</x><x>c</x>
).
The magic happens in the second CROSS APPLY
:
.nodes('/x[not(empty(. cast as xs:int?))]')
will return fragments, where the cast to xs:int?
does not return empty. In other words: All values which are numbers.
As we return only numbers, we can savely use int
in the .value()
-method and can therefore simply check the interval using BETWEEN
.
Thx to HABO you might change this slightly:
SELECT ID,[value]
,fragment.value('text()[1]','nvarchar(max)')
FROM @DataSource ds
CROSS APPLY(SELECT CAST('<x>' + REPLACE(REPLACE((SELECT [value] AS [*] FOR XML PATH('')),',',' '),' ','</x><x>') + '</x>' AS XML)) A(Casted)
CROSS APPLY A.Casted.nodes('/x[not(empty(. cast as xs:int?))]') B(fragment)
WHERE LEN(fragment.value('text()[1]','nvarchar(max)'))=5;
the idea is still the same: Find pure numbers. But the result is handled as string (not omiting leading zeros) and the filter is checking for a length of 5.
Upvotes: 4
Reputation: 43636
Assuming you are using at least SQL Server 2012, the following is full working example. If you are using earlier version, replace the IIF
with CASE WHEN
and the TRY_CONVERT
with other technique for checking if a string is a number (you can use pathindex to check if there are any not numeric characters, for example).
DECLARE @DataSource TABLE
(
[value] NVARCHAR(MAX)
);
INSERT INTO @DataSource ([value])
VALUES ('6453 W Church Dr, Summitville, CA, 46741, Brown')
,('2456 North Street, Greensboro, NC 27403')
,('5847 East Wood Dr, VA')
,('530 Alhambra Dr. Apt. A1 Anderson IN 46012')
,('1511 W 9th St, Anderson, IN, 46016')
,('2900 W 22nd St, Anderson, IN, 46013, Madison')
,('Community Hospital, 1515 N Madison Ave, Anderson, IN, 46011');
WITH DataSource ([value], [valueXML]) AS
(
SELECT [value]
,CAST(('<X>'+REPLACE(REPLACE([value], ' ', ','),',' ,'</X><X>')+'</X>') AS XML)
FROM @DataSource
),
DataSourceFinal ([value], [number], [rank], [numbers_count]) AS
(
SELECT [value]
,T.c.value('.', 'nvarchar(128)')
,ROW_NUMBER() OVER (PARTITION BY DS.[value] ORDER BY T.c DESC)
,COUNT(T.c.value('.', 'nvarchar(128)')) OVER (PARTITION BY [value])
FROM DataSource DS
CROSS APPLY DS.[valueXML].nodes('X') T(c)
WHERE TRY_CONVERT(BIGINT, T.c.value('.', 'nvarchar(128)')) IS NOT NULL
AND T.c.value('.', 'nvarchar(128)') <> ''
)
SELECT [value]
,IIF([numbers_count] > 1, [number], NULL)
FROM DataSourceFinal
WHERE [rank] = 1;
The idea is simple - replace all empty strings with commas. Then split the newly created CSVs and look for the last number.
This can be solve really easy if you have regex support in your T-SQL. For, example:
WITH DataSource ([value], [number], [rank], [numbers_count]) AS
(
SELECT DS.[value]
,REVERSE(RM.[CaptureValue])
,RM.[MatchID]
,COUNT(RM.[MatchID]) OVER (PARTITION BY DS.[value])
FROM @DataSource DS
CROSS APPLY [dbo].[fn_Utils_RegexMatches] (REVERSE([value]), '\d+') RM
)
SELECT [value]
,IIF([numbers_count] > 1, [number], NULL)
FROM DataSource
WHERE [rank] = 0;
Upvotes: 1