riki
riki

Reputation: 186

Is there a special character for tab indention in snowflake?

I need to count words in string in snowflake. the problem is that between the words there are spaces, in variety count.

for example:

str = '67778        19gj5  7770 202ty524 2024       i900      3290 POC/*'

I want to get the number of words- 8.

what I tried:

array_size(split(str, ' '))

return 27 :(

and

array_size(split(str, '\t'))

return 1.

any idea please?

Upvotes: 1

Views: 3082

Answers (3)

leftjoin
leftjoin

Reputation: 38335

This is implementation of @NickW suggestion with some improvement:

Replace all consecutive whitespace characters (\\s+) with single space and split it, use array_size, it will work with all whitespace: spaces, tabs, newlines, etc:

WITH t1 AS (
select '67778        19gj5  7770 202ty524 2024       i900      3290 POC/*' as str
    )
    
 select array_size(split(regexp_replace(str, '\\s+',' '),' ')) num_words
 from t1

Result:

NUM_WORDS
8

Upvotes: 2

Lukasz Szozda
Lukasz Szozda

Reputation: 175826

You could use SPLIT_TO_TABLE and COUNT_IF:

CREATE OR REPLACE TABLE t AS
SELECT  '67778        19gj5  7770 202ty524 2024       i900      3290 POC/*' AS str;

Query:

SELECT STR, COUNT_IF(s.VALUE != '') AS num_of_words
FROM t,
LATERAL SPLIT_TO_TABLE(t.str, ' ')s
GROUP BY STR;

Output:

STR                                                                 num_of_words
67778        19gj5  7770 202ty524 2024       i900      3290 POC/*   8

EDIT:

Alternatively using REGEXP_COUNT:

SELECT *, REGEXP_COUNT(str, '\\b\S+\\b') AS num_of_words
FROM t;

Upvotes: 0

NickW
NickW

Reputation: 9788

Alternative solution:

  1. Use regex_replace to replace all occurrences of multiple spaces with a single space
  2. Split the string using a single space as the delimiter (i.e. what you tried originally)

Upvotes: 1

Related Questions