Control Freak
Control Freak

Reputation: 13213

Get Word Count of a Column using SQL

I have a table with a column called Description. The column is populated with text data. I want to create a query that returns the amount of words in each description.

My thought was to create a function that takes in a value, and returns the amount of words found in the inputted text.

SELECT dbo.GetWordCount(Description) FROM TABLE

For example, if the description is "Hello World! Have a nice day.", the query should return 6.

How can I get the word count of the description column?

Upvotes: 1

Views: 14986

Answers (6)

qleoz12
qleoz12

Reputation: 553

requisites: SQL Server 2016 and later

I use this in my sp , I receive a sentence, so i can handle inner spaces.

enter image description here

SELECT value from STRING_SPLIT(@oracion1,' ')

now I filter for values with text and count them for to achieve this :

enter image description here

SELECT count(value) from STRING_SPLIT(@str,' ') where len(value)>0

@oracion1 colud be N"JUAN ES CARPINTERO " or @oracion1 could be N"JUAN ES CARPINTERO "

Upvotes: 0

Tot Zam
Tot Zam

Reputation: 8746

This answer is based on the same code used in Mortalus's answer, which I originally found here.

This solution is a more efficient and more concise version of that code. I've also add some explanation for the code that will hopefully make this answer clearer for future readers.


The following user defined function takes in a string of text, and then loops through the each character of the inputted text. If the previous character was a space, the word count is increased by one.

Since the word count is calculated by counting the spaces between the words, there will always be 1 less space than actual words. To counteract this, start @PrevChar with the value of ' '. Then, when the loop is run for the first time, when the code then reaches IF @PrevChar = ' ', it will return true, and the word count will be increase by one. This works even if the text has a length of 0, since in that case, it just won't get passed the @Index <= LEN(@InputString) check, and the word count will never be increased. (This replaces the CASE statement used in the linked answer.)

AND @CurrentChar != ' ' is used to solve the problem of double spacing being counted as multiple words. If the previous character is a space, but the current character is also a space, move on to the next index without increasing the word count. The next iteration will then only have @PrevChar set to ' ', and so the word count will only be increase once for the double space.

CREATE FUNCTION [dbo].[WordCount] (@InputString VARCHAR(MAX))
RETURNS INT
AS
BEGIN
    DECLARE @Index INT = 1
    DECLARE @CurrentChar CHAR(1)

    --Initialize the previous character as a space.
    DECLARE @PrevChar CHAR(1) = ' '

    DECLARE @WordCount INT = 0

    WHILE @Index <= LEN(@InputString)
    BEGIN
        --Set the current character to equal the character in the index 
        --position of the inputted text.
        SET @CurrentChar= SUBSTRING(@InputString, @Index, 1)

        --If the previous character was a space and the current character
        --is not a space, increase the wordcount by 1.
        IF @PrevChar = ' ' AND @CurrentChar != ' '
            SET @WordCount = @WordCount + 1

        --Increase the index counter by 1.
        SET @Index = @Index + 1

        --Now that we are done with the current character, set the previous
        --character to equal the current character.
        SET @PrevChar = @CurrentChar
    END

    RETURN @WordCount
END

Upvotes: 0

Tushar Purohit
Tushar Purohit

Reputation: 41

Generalized Syntax:

SELECT (LENGTH(column_name) - LENGTH(REPLACE(column_name, ' ', ''))),column_name1,column_name2 FROM table_name;

In case, if you want to calculate how many words are there in single 'address' column of a table named 'employeeDetails' then:

SELECT (LENGTH(address) - LENGTH(REPLACE(address, ' ', ''))),address,employee_name FROM employeeDetails ;

Upvotes: 4

hkravitz
hkravitz

Reputation: 1385

In addition to Mortalus's answer I'd use an inline function rather than scalar (*Note - this function will work from SQL Server 2012 and up) for Previous versions of SQL Server see below:

    /*SQL Server 2012 and up*/
    CREATE FUNCTION dbo.udf_WordCount 
    (

    @str VARCHAR(8000) 

    )
    RETURNS TABLE AS RETURN

    WITH Tally (n) AS
    (
        SELECT TOP (LEN(@str)) ROW_NUMBER()  OVER (ORDER BY (SELECT NULL)) 
        FROM (VALUES (0),(0),(0),(0),(0),(0),(0),(0)) a(n)
        CROSS JOIN (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) b(n)
        CROSS JOIN (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) c(n)
        CROSS JOIN (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) d(n)
    )
    , BreakChar as
    (
        SELECT  SUBSTRING(@str , n , 1) [Char] , N
        FROM Tally

    )
    , Analize as 
    (
        SELECT * , lag([Char],1) OVER (ORDER BY N) PrevChar
        FROM BreakChar
    )

        SELECT WordCount = COUNT(1) + 1 
        FROM Analize
        WHERE [Char] != PrevChar
        AND PrevChar = ' ' 

How to Use:

    DECLARE @str varchar(1000) = 'It''s now or never I ain''t gonna live forever'
    SELECT * FROM dbo.udf_WordCount(@str) --> 9 

**SQL Server 2008 and lower:

    /*SQL Server 2008 and down*/
    CREATE FUNCTION dbo.udf_WordCount_2008 
    (
    --declare
    @str VARCHAR(8000) 
    --= 'It''s now or never I ain''t gonna live forever'
    )
    RETURNS TABLE AS RETURN

    WITH Tally (n) AS
    (
        SELECT TOP (LEN(@str)) ROW_NUMBER()  OVER (ORDER BY (SELECT NULL)) 
        FROM (VALUES (0),(0),(0),(0),(0),(0),(0),(0)) a(n)
        CROSS JOIN (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) b(n)
        CROSS JOIN (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) c(n)
        CROSS JOIN (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) d(n)
    )
    , BreakChar as
    (
        SELECT  SUBSTRING(@str , n , 1) [Char] , N
        FROM Tally

    )
    , Analize as 
    (
        SELECT a.* , b.Char PrevChar
        FROM BreakChar a
        JOIN BreakChar b
        on a.n = b.n+1


    )

        SELECT WordCount = COUNT(1) + 1 
        FROM Analize
        WHERE [Char] != PrevChar
        AND PrevChar = ' ' 

Upvotes: 0

Nate Reynolds
Nate Reynolds

Reputation: 31

This is a little cumbersome but it handles the whitespace issue nicely, its fast and inline, no udf.

DECLARE @Term VARCHAR(100) = '  this   is   pretty fast '

SELECT @Term, LEN(REPLACE(REPLACE(REPLACE(' '+@Term,'  ',' '+CHAR(1)) ,CHAR(1)+' ',''),CHAR(1),'')) -  LEN(REPLACE(REPLACE(REPLACE(REPLACE(' '+@Term,'  ',' '+CHAR(1)) ,CHAR(1)+' ',''),CHAR(1),''),' ','')) [Word Count]

Upvotes: 2

Mortalus
Mortalus

Reputation: 10712

See this proposed solution: http://www.sql-server-helper.com/functions/count-words.aspx

CREATE FUNCTION [dbo].[WordCount] ( @InputString VARCHAR(4000) ) 
RETURNS INT
AS
BEGIN

DECLARE @Index          INT
DECLARE @Char           CHAR(1)
DECLARE @PrevChar       CHAR(1)
DECLARE @WordCount      INT

SET @Index = 1
SET @WordCount = 0

WHILE @Index <= LEN(@InputString)
BEGIN
    SET @Char     = SUBSTRING(@InputString, @Index, 1)
    SET @PrevChar = CASE WHEN @Index = 1 THEN ' '
                         ELSE SUBSTRING(@InputString, @Index - 1, 1)
                    END

    IF @PrevChar = ' ' AND @Char != ' '
        SET @WordCount = @WordCount + 1

    SET @Index = @Index + 1
END

RETURN @WordCount

END
GO

Usage Example:

DECLARE @String VARCHAR(4000)
SET @String = 'Health Insurance is an insurance against expenses incurred through illness of the insured.'

SELECT [dbo].[WordCount] ( @String )

Upvotes: 2

Related Questions