Reputation: 391
Having inherited a database with various string columns in various tables, I'm adding some validation into the code that references the database in order to validate the user data which feeds into these columns. Before I do this I wanted to find out which characters are already used in the columns so I have some idea of how to validate future data.
For example product names in a product table:
CREATE TABLE Products
(
[prod_id] [smallint] NOT NULL,
[prod_name] [nvarchar](50) NOT NULL
)
So I have
CREATE TABLE #letters (letter char(1));
DECLARE @POS INT
SET @POS = 1
WHILE (@POS <= (SELECT MAX(LEN([prod_name])) FROM [Products]))
BEGIN
INSERT INTO #letters
SELECT SUBSTRING([prod_name], @POS, 1) FROM [Products]
SET @POS = @POS + 1
END
SELECT DISTINCT letter, CAST(letter AS binary)
FROM #letters
ORDER BY letter
DELETE FROM #letters
This is quite clunky and slow, any suggestions for a neater solution?
Thanks
Upvotes: 2
Views: 2194
Reputation: 1261
The following will split all of the characters in the column into rows in the CTE, then I've added the counts by character as well. Not sure what your data volume is like, but I tested this against a wide column in a table with almost 1 million rows and the query took under a minute to run.
drop table if exists #products
create table #products (prod_name nvarchar (50) NOT NULL)
insert into #products (prod_name)
values ('product1'),('product2'),('widget1'),('widget2')
;WITH split_products
AS (
select substring(a.prod_name, v.number+1, 1) as charact
from (select prod_name from #products b) a
join master..spt_values v on v.number < len(a.prod_name)
where v.type = 'P'
)
select charact, count(*)
from split_products
group by charact
order by charact
Upvotes: 1