Reputation: 173

Selecting rows that have certain number of specific type of character in a column

Microsoft SQL Server

I need a query that will return all rows that greater than X number of commas exist in the description column. Like won't work because the commas would be spread out and the in between text would be different. I'm not certain this query exists at all.

Any insight or assistance with that would be greatly appreciated.

Thank you for your time.

Upvotes: 2

Answers (2)

Hannah Vernon

Reputation: 3472

In case you are interested in performance, and would like to implement a slightly more complicated approach, I mocked up some data and did a relative simple test:

CREATE TABLE dbo.Test
(
    TestID INT NOT NULL CONSTRAINT PK_Test
        PRIMARY KEY CLUSTERED IDENTITY(1,1)
    , col1 VARCHAR(255) NOT NULL
    , col1_comma_count AS LEN(col1) - LEN(REPLACE(col1, ',','')) PERSISTED
);


INSERT INTO Test (col1) VALUES ('this, is, a, test');
GO 50000
INSERT INTO Test (col1) VALUES ('this, is, a, test, another, test');
GO 1500

The statements above create a test table with a computed column that contains the count of the number of commas in col1. The table then has 50,000 rows inserted where the comma count is 3, and 1,500 rows inserted where the comma count is 5.

I then executed the following query, with SET STATISTICS IO ON; SET STATISTICS TIME ON;:

SELECT COUNT(1)
FROM dbo.Test t
WHERE t.col1_comma_count = 5;

The statistics info:

enter image description here

As you can see, 248 logical reads are necessary to scan the entire table to obtain the count of rows where 5 commas. The execution plan for this query looks like:

enter image description here

As expected, SQL Server does a clustered index scan of the entire table.

I then created an index on the persisted computed column, to show the difference:

CREATE INDEX IX_Test_col1_comma_count ON dbo.Test (col1_comma_count);

and re-ran the test query. Here are the statistics info for the run with the index:

enter image description here

The number of reads necessary has dropped to 6, or 41 times less reads. On a busy system this will make a real difference. Here is the new execution plan:

enter image description here

This time, we see a much more efficient seek into the index.

If we drop both the index and the computed column from the table, we see a huge increase in time spent getting the results of the query:

DROP INDEX IX_Test_col1_comma_count ON dbo.Test;
ALTER TABLE Test DROP COLUMN col1_comma_count;

SELECT COUNT(1)
FROM dbo.Test t
WHERE LEN(col1) - LEN(REPLACE(col1, ',','')) = 5

STATISTICS TIME ON shows a value on my computer (an Intel Core-i7 3.4Ghz with 8GB ram) of SQL Server Execution Times: CPU time = 15 ms, elapsed time = 24 ms.

With the index and computed, persisted column in place, the SQL Server execution times are SQL Server Execution Times: CPU time = 0 ms, elapsed time = 2 ms.

Clearly, there is a price to pay for doing string manipulation in the WHERE clause.

Upvotes: 1

Mureinik

Reputation: 311458

One way of counting the number of times a character appears in a string is to compare the string's length to the length of the string with this character removed.

So, e.g., assuming you want to find all the rows with 5 commas in col1:

SELECT *
FROM   my_table
WHERE  LEN(col1) - LEN(REPLACE(col1, ',', '')) = 5

Upvotes: 4

Selecting rows that have certain number of specific type of character in a column

Answers (2)

Related Questions