zzzbbx
zzzbbx

Reputation: 10161

SQL count number of words in field

I'd like to make an SQL query where the condition is that column1 contains three or more words. Is there something to do that?

Upvotes: 14

Views: 80961

Answers (11)

Rob R
Rob R

Reputation: 1

I like @ashleydawg's answer with REGEX but stackoverflow won't let me comment or like it due to "reputation points".. so annoying!

Only thing to change there -- add 1 to the result. In this example, it returns 3 instead of 4.

set @mystring = "This is four words";
select LENGTH(REGEXP_REPLACE(@mystring, '[ \n\t\|\-]{1,}',' ')) - 
  LENGTH(REGEXP_REPLACE(@mystring, '[ \n\t\|\-]{1,}', '')) wordCount

Upvotes: 0

ashleedawg
ashleedawg

Reputation: 21657

None of the other answers seem to take multiple spaces into account. For example, a lot of people use two spaces between sentences; these space-counters would count an extra word per sentence. "Also, scenarios such as spaces around a hyphen - like that. "

For my purposes, this was far more accurate:

SELECT 
  LENGTH(REGEXP_REPLACE(myText, '[ \n\t\|\-]{1,}',' ')) - 
  LENGTH(REGEXP_REPLACE(myText, '[ \n\t\|\-]{1,}', '')) wordCount FROM myTable;

It counts any sets of 1 or more consecutive characters from any of: [space, linefeed, tab, pipe, or hyphen] and counts it as one word.

Upvotes: 1

iamprageeth
iamprageeth

Reputation: 43

With ClickHouse DB You can use splitByWhitespace() function.

Refer : https://clickhouse.com/docs/en/sql-reference/functions/splitting-merging-functions#splitbywhitespaces

Upvotes: 1

AmericanTourist
AmericanTourist

Reputation: 93

I think David nailed it above. However, as a more complete answer:

LENGTH(RTRIM(LTRIM(REPLACE(column1,'  ', ' ')))) - LENGTH(REPLACE(RTRIM(LTRIM(REPLACE(column1, '  ', ' '))), ' ', '')) + 1 AS number_of_words

This will remove double spaces, as well as leading and trailing spaces in your string.

Of course, you may go further by adding replacements for more than 2 spaces in a row...

Upvotes: 5

David Chan
David Chan

Reputation: 7505

maybe try counting spaces ?

SELECT * 
FROM table
WHERE (LENGTH(column1) - LENGTH(replace(column1, ' ', ''))) > 1

and assume words is number of spaces + 1

Upvotes: 34

Madhivanan
Madhivanan

Reputation: 13700

To handle multiple spaces too, use the method shown here

Declare @s varchar(100)
set @s='  See      how many                        words this      has  '
set @s=ltrim(rtrim(@s))

while charindex('  ',@s)>0
Begin
    set @s=replace(@s,'  ',' ')
end

select len(@s)-len(replace(@s,' ',''))+1 as word_count

https://exploresql.com/2018/07/31/how-to-count-number-of-words-in-a-sentence/

Upvotes: -1

Trideep Rath
Trideep Rath

Reputation: 3703

For "n" or more words

select *
from table
where (length(column)- length(replace(column, " ", "")) + 1) >= n

PS: This would not work if words have multiple spaces between them.

Upvotes: 1

Evan de la Cruz
Evan de la Cruz

Reputation: 1966

The best way to do this, is to NOT do this.

Instead, you should use the application layer to count the words during INSERT and save the word count into its own column.

While I like, and upvoted, some of the answers here, all of them will be very slow and not 100% accurate.

I know people want a simple answer to SELECT the word count, but it just is NOT POSSIBLE with accuracy and speed.

If you want it to be 100% accurate, and very fast, then use this solution.

Steps to solve:

  1. Add a column to your table and index it: ALTER TABLE tablename ADD COLUMN wordcount INT UNSIGNED NULL, ADD INDEX idxtablename_count (wordcount ASC);.
  2. Before doing your INSERT, count the number of words using your application. For example in PHP: $count = str_word_count($somevalue);
  3. During the INSERT, include the value of $count for the column wordcount like insert into tablename (col1, col2, col3, wordcount) values (val1, val2, val3, $count);

Then your select statement becomes super easy, clean, uber-fast, and 100% accurate.

select * from tablename where wordcount >= 3;

Also remember when you are updating any rows that you will need to recount the words for that column.

Upvotes: 3

user330315
user330315

Reputation:

In Postgres you can use regexp_split_to_array() for this:

select *
from the_table
where array_length(regexp_split_to_array(the_column, '\s+'), 1) >= 3;

This will split the contents of the column the_column into array elements. One ore more whitespace are used as the delimiter. It won't respect "quoted" spaces though. The value 'one "two three" four' will be counted as four words.

Upvotes: 4

Gordon Linoff
Gordon Linoff

Reputation: 1271231

If you want a condition that a column contains three or more words and you want it to work in a bunch of databases and we assume that words are separated by single spaces, then you can use like:

where column1 like '% % %'

Upvotes: 5

Shahar
Shahar

Reputation: 1677

This can work:

SUM(LENGTH(a) - LENGTH(REPLACE(a, ' ', '')) + 1)

Where a is the string column. It will count the number of spaces, which is 1 less than the number of words.

Upvotes: 0

Related Questions