Reputation: 13213
I have a table with a column called Description
. The column is populated with text data. I want to create a query that returns the amount of words in each description.
My thought was to create a function that takes in a value, and returns the amount of words found in the inputted text.
SELECT dbo.GetWordCount(Description) FROM TABLE
For example, if the description is "Hello World! Have a nice day.", the query should return 6.
How can I get the word count of the description column?
Upvotes: 1
Views: 14986
Reputation: 553
requisites: SQL Server 2016 and later
I use this in my sp , I receive a sentence, so i can handle inner spaces.
SELECT value from STRING_SPLIT(@oracion1,' ')
now I filter for values with text and count them for to achieve this :
SELECT count(value) from STRING_SPLIT(@str,' ') where len(value)>0
@oracion1 colud be N"JUAN ES CARPINTERO " or @oracion1 could be N"JUAN ES CARPINTERO "
Upvotes: 0
Reputation: 8746
This answer is based on the same code used in Mortalus's answer, which I originally found here.
This solution is a more efficient and more concise version of that code. I've also add some explanation for the code that will hopefully make this answer clearer for future readers.
The following user defined function takes in a string of text, and then loops through the each character of the inputted text. If the previous character was a space, the word count is increased by one.
Since the word count is calculated by counting the spaces between the words, there will always be 1 less space than actual words. To counteract this, start @PrevChar
with the value of ' '
. Then, when the loop is run for the first time, when the code then reaches IF @PrevChar = ' '
, it will return true, and the word count will be increase by one. This works even if the text has a length of 0, since in that case, it just won't get passed the @Index <= LEN(@InputString)
check, and the word count will never be increased. (This replaces the CASE
statement used in the linked answer.)
AND @CurrentChar != ' '
is used to solve the problem of double spacing being counted as multiple words. If the previous character is a space, but the current character is also a space, move on to the next index without increasing the word count. The next iteration will then only have @PrevChar
set to ' '
, and so the word count will only be increase once for the double space.
CREATE FUNCTION [dbo].[WordCount] (@InputString VARCHAR(MAX))
RETURNS INT
AS
BEGIN
DECLARE @Index INT = 1
DECLARE @CurrentChar CHAR(1)
--Initialize the previous character as a space.
DECLARE @PrevChar CHAR(1) = ' '
DECLARE @WordCount INT = 0
WHILE @Index <= LEN(@InputString)
BEGIN
--Set the current character to equal the character in the index
--position of the inputted text.
SET @CurrentChar= SUBSTRING(@InputString, @Index, 1)
--If the previous character was a space and the current character
--is not a space, increase the wordcount by 1.
IF @PrevChar = ' ' AND @CurrentChar != ' '
SET @WordCount = @WordCount + 1
--Increase the index counter by 1.
SET @Index = @Index + 1
--Now that we are done with the current character, set the previous
--character to equal the current character.
SET @PrevChar = @CurrentChar
END
RETURN @WordCount
END
Upvotes: 0
Reputation: 41
Generalized Syntax:
SELECT (LENGTH(column_name) - LENGTH(REPLACE(column_name, ' ', ''))),column_name1,column_name2 FROM table_name;
In case, if you want to calculate how many words are there in single 'address' column of a table named 'employeeDetails' then:
SELECT (LENGTH(address) - LENGTH(REPLACE(address, ' ', ''))),address,employee_name FROM employeeDetails ;
Upvotes: 4
Reputation: 1385
In addition to Mortalus's answer I'd use an inline function rather than scalar (*Note - this function will work from SQL Server 2012 and up) for Previous versions of SQL Server see below:
/*SQL Server 2012 and up*/
CREATE FUNCTION dbo.udf_WordCount
(
@str VARCHAR(8000)
)
RETURNS TABLE AS RETURN
WITH Tally (n) AS
(
SELECT TOP (LEN(@str)) ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
FROM (VALUES (0),(0),(0),(0),(0),(0),(0),(0)) a(n)
CROSS JOIN (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) b(n)
CROSS JOIN (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) c(n)
CROSS JOIN (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) d(n)
)
, BreakChar as
(
SELECT SUBSTRING(@str , n , 1) [Char] , N
FROM Tally
)
, Analize as
(
SELECT * , lag([Char],1) OVER (ORDER BY N) PrevChar
FROM BreakChar
)
SELECT WordCount = COUNT(1) + 1
FROM Analize
WHERE [Char] != PrevChar
AND PrevChar = ' '
How to Use:
DECLARE @str varchar(1000) = 'It''s now or never I ain''t gonna live forever'
SELECT * FROM dbo.udf_WordCount(@str) --> 9
**SQL Server 2008 and lower:
/*SQL Server 2008 and down*/
CREATE FUNCTION dbo.udf_WordCount_2008
(
--declare
@str VARCHAR(8000)
--= 'It''s now or never I ain''t gonna live forever'
)
RETURNS TABLE AS RETURN
WITH Tally (n) AS
(
SELECT TOP (LEN(@str)) ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
FROM (VALUES (0),(0),(0),(0),(0),(0),(0),(0)) a(n)
CROSS JOIN (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) b(n)
CROSS JOIN (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) c(n)
CROSS JOIN (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) d(n)
)
, BreakChar as
(
SELECT SUBSTRING(@str , n , 1) [Char] , N
FROM Tally
)
, Analize as
(
SELECT a.* , b.Char PrevChar
FROM BreakChar a
JOIN BreakChar b
on a.n = b.n+1
)
SELECT WordCount = COUNT(1) + 1
FROM Analize
WHERE [Char] != PrevChar
AND PrevChar = ' '
Upvotes: 0
Reputation: 31
This is a little cumbersome but it handles the whitespace issue nicely, its fast and inline, no udf.
DECLARE @Term VARCHAR(100) = ' this is pretty fast '
SELECT @Term, LEN(REPLACE(REPLACE(REPLACE(' '+@Term,' ',' '+CHAR(1)) ,CHAR(1)+' ',''),CHAR(1),'')) - LEN(REPLACE(REPLACE(REPLACE(REPLACE(' '+@Term,' ',' '+CHAR(1)) ,CHAR(1)+' ',''),CHAR(1),''),' ','')) [Word Count]
Upvotes: 2
Reputation: 10712
See this proposed solution: http://www.sql-server-helper.com/functions/count-words.aspx
CREATE FUNCTION [dbo].[WordCount] ( @InputString VARCHAR(4000) )
RETURNS INT
AS
BEGIN
DECLARE @Index INT
DECLARE @Char CHAR(1)
DECLARE @PrevChar CHAR(1)
DECLARE @WordCount INT
SET @Index = 1
SET @WordCount = 0
WHILE @Index <= LEN(@InputString)
BEGIN
SET @Char = SUBSTRING(@InputString, @Index, 1)
SET @PrevChar = CASE WHEN @Index = 1 THEN ' '
ELSE SUBSTRING(@InputString, @Index - 1, 1)
END
IF @PrevChar = ' ' AND @Char != ' '
SET @WordCount = @WordCount + 1
SET @Index = @Index + 1
END
RETURN @WordCount
END
GO
Usage Example:
DECLARE @String VARCHAR(4000)
SET @String = 'Health Insurance is an insurance against expenses incurred through illness of the insured.'
SELECT [dbo].[WordCount] ( @String )
Upvotes: 2