rbassett
rbassett

Reputation: 391

How to find all the characters used in a particular column in a table in SQL Server?

Having inherited a database with various string columns in various tables, I'm adding some validation into the code that references the database in order to validate the user data which feeds into these columns. Before I do this I wanted to find out which characters are already used in the columns so I have some idea of how to validate future data.

For example product names in a product table:

CREATE TABLE Products 
(
    [prod_id] [smallint] NOT NULL,
    [prod_name] [nvarchar](50) NOT NULL
)

So I have

CREATE TABLE #letters (letter char(1));

DECLARE @POS INT
SET @POS = 1

WHILE (@POS <= (SELECT MAX(LEN([prod_name])) FROM [Products])) 
BEGIN
    INSERT INTO #letters 
        SELECT SUBSTRING([prod_name], @POS, 1) FROM [Products]

    SET @POS = @POS + 1
END

SELECT DISTINCT letter, CAST(letter AS binary) 
FROM #letters 
ORDER BY letter

DELETE FROM #letters

This is quite clunky and slow, any suggestions for a neater solution?

Thanks

Upvotes: 2

Views: 2194

Answers (1)

WAMLeslie
WAMLeslie

Reputation: 1261

The following will split all of the characters in the column into rows in the CTE, then I've added the counts by character as well. Not sure what your data volume is like, but I tested this against a wide column in a table with almost 1 million rows and the query took under a minute to run.

drop table if exists #products
create table #products (prod_name nvarchar (50) NOT NULL)

insert into #products (prod_name)
values ('product1'),('product2'),('widget1'),('widget2')

;WITH split_products
AS (
    select substring(a.prod_name, v.number+1, 1) as charact
    from (select prod_name from #products b) a
    join master..spt_values v on v.number < len(a.prod_name)
    where v.type = 'P'
    )
select charact, count(*)
from split_products
group by charact
order by charact

Upvotes: 1

Related Questions