query to display count of corresponding each distinct word

Question

There is a column in a table which can store up to 4000 characters. So for a given row, we need to write a query to display count of corresponding each distinct word in the sentence.

For e.g. the column has "Jack and Jill went up a hill. Jack came tumbling down"

Output :  
 -  
Jack - 2 
Jill - 1
hill - 1
and - 1
a - 1
came - 1 ... and so on

Maheswaran Ravisankar · Accepted Answer

First , convert the words into rows and then group it.

In this query, we use a basic concept of row generation using CONNECT BY.

For Example:

select level from dual CONNECT BY level <= 10;

The above query would generate 10 rows.(Hierarchical Level query).

Based on this simple logic, now we have to count the number of spaces here, and generate that many rows.REGEXP_COUNT(str,'[^ ]+') would give the number of spaces in the sentence.

And using the level, extract a word from the sentence in each row. REGEXP_SUBSTR(str,'[^ ]+',1,level) would do this.

You can play around with this query to handle other scenarios. Good Luck.

with tokenised_rows(str) as(
SELECT  REGEXP_SUBSTR('Jack and Jill went up a hill. Jack came tumbling down','[^ ]+',1,LEVEL) 
  FROM dual
CONNECT BY level <= REGEXP_COUNT('Jack and Jill went up a hill. Jack came tumbling down','[^ ]+')
)
select str,count(1) from tokenised_rows
group by str;

query to display count of corresponding each distinct word

Answers (2)

Related Questions