Reputation: 186
I need to count words in string in snowflake. the problem is that between the words there are spaces, in variety count.
for example:
str = '67778 19gj5 7770 202ty524 2024 i900 3290 POC/*'
I want to get the number of words- 8.
what I tried:
array_size(split(str, ' '))
return 27 :(
and
array_size(split(str, '\t'))
return 1.
any idea please?
Upvotes: 1
Views: 3082
Reputation: 38335
This is implementation of @NickW suggestion with some improvement:
Replace all consecutive whitespace characters (\\s+
) with single space and split it, use array_size, it will work with all whitespace: spaces, tabs, newlines, etc:
WITH t1 AS (
select '67778 19gj5 7770 202ty524 2024 i900 3290 POC/*' as str
)
select array_size(split(regexp_replace(str, '\\s+',' '),' ')) num_words
from t1
Result:
NUM_WORDS
8
Upvotes: 2
Reputation: 175826
You could use SPLIT_TO_TABLE
and COUNT_IF
:
CREATE OR REPLACE TABLE t AS
SELECT '67778 19gj5 7770 202ty524 2024 i900 3290 POC/*' AS str;
Query:
SELECT STR, COUNT_IF(s.VALUE != '') AS num_of_words
FROM t,
LATERAL SPLIT_TO_TABLE(t.str, ' ')s
GROUP BY STR;
Output:
STR num_of_words
67778 19gj5 7770 202ty524 2024 i900 3290 POC/* 8
EDIT:
Alternatively using REGEXP_COUNT
:
SELECT *, REGEXP_COUNT(str, '\\b\S+\\b') AS num_of_words
FROM t;
Upvotes: 0
Reputation: 9788
Alternative solution:
Upvotes: 1