Nebu
Nebu

Reputation: 1793

Count how often a row contains a keyword

What is the most efficient way to count in how many rows each keyword is present. For example given the following list:

cat, dog, basement, parrot, whale, dolphins

And given the following simple database:

ID  TEXTDATA
1   The cat is in the basement, the cat is not on the roof.
2   The dog is going home.
3   The birds are flying in the sky.
4   The dolphins are swimming in the ocean, i really like dolphins.
5   The mouse live in the basement.

The results should be:

cat=1 ( two occurrences in one row )
dog=1 ( one occurrence in one row )
basement=2 ( two occurrences in two rows )
parrot=0 ( zero occurrences )
whale=0 ( zero occurrences )
dolphins=1 ( two occurrence in one row )

The results may be computed in coldfusion or mysql whichever is more efficient.

Upvotes: 3

Views: 145

Answers (3)

Sung
Sung

Reputation: 480

Firstly, I simulated the dataset via QuerySim (http://cflib.org/udf/QuerySim).

<cfscript>
data = querySim('
ID , TEXTDATA
1|The cat is in the basement, the cat is not on the roof.
2|The dog is going home.
3|The birds are flying in the sky.
4|The dolphins are swimming in the ocean, i really like dolphins.
5|The mouse live in the basement.
');
</cfscript>

A nice data dump to verify.

<cfdump var="#data#">

And a pair of loops.

<cfloop list="cat,dog,basement,parrot,whale,dolphins" index="i" >
    <cfset count = 0>
    <cfloop query="data" >
        <cfif ListFind(REReplaceNoCase(TEXTDATA, '[^a-z0-9]', ' ', 'all'), variables.i, " ")>
            <cfset count++>
        </cfif>
    </cfloop>
    <cfoutput>#variables.i#=#variables.count#</cfoutput><br>
</cfloop> 

Is it the most efficient? Not sure. I just wrapped another CFLOOP and looped it 10000 times:

{ts '2016-06-20 10:21:35'} {ts '2016-06-20 10:21:44'}

Nine seconds.

Upvotes: 1

Dan Bracuk
Dan Bracuk

Reputation: 20794

Try nested loops.

<cfset wordCount = ArrayNew(2)>
<cfset arrayRow = 0>
<cfset words = "cat,dog,etc">
<cfloop list = "#words# index = "thisWord">
<cfset arrayRow ++>
<cfset wordCount[arrayRow][1] = thisWord>
<cfset wordCount[arrayRow[2] = 0>

<cfloop query = "yourQuery">
<cfif listfind(textdata, thisWord, " "> <!--- space delimited --->
<cfset wordCount[arrayRow[2] += 1>
closing tags

Upvotes: 1

Strawberry
Strawberry

Reputation: 33935

SELECT x.keyword
     , SUM(textdata RLIKE CONCAT('[[:<:]]',CAST(x.keyword AS CHAR CHARACTER SET utf8),'[[:>:]]')) cnt FROM my_table
JOIN (
SELECT 'cat' keyword
 UNION
SELECT 'dog'
 UNION
SELECT 'basement'
 UNION
SELECT 'parrot'
 UNION
SELECT 'whale'
 UNION
SELECT 'dolphins'
) x 
GROUP BY keyword;

Upvotes: 1

Related Questions