Reputation: 1793
What is the most efficient way to count in how many rows each keyword is present. For example given the following list:
cat, dog, basement, parrot, whale, dolphins
And given the following simple database:
ID TEXTDATA
1 The cat is in the basement, the cat is not on the roof.
2 The dog is going home.
3 The birds are flying in the sky.
4 The dolphins are swimming in the ocean, i really like dolphins.
5 The mouse live in the basement.
The results should be:
cat=1 ( two occurrences in one row )
dog=1 ( one occurrence in one row )
basement=2 ( two occurrences in two rows )
parrot=0 ( zero occurrences )
whale=0 ( zero occurrences )
dolphins=1 ( two occurrence in one row )
The results may be computed in coldfusion or mysql whichever is more efficient.
Upvotes: 3
Views: 145
Reputation: 480
Firstly, I simulated the dataset via QuerySim (http://cflib.org/udf/QuerySim).
<cfscript>
data = querySim('
ID , TEXTDATA
1|The cat is in the basement, the cat is not on the roof.
2|The dog is going home.
3|The birds are flying in the sky.
4|The dolphins are swimming in the ocean, i really like dolphins.
5|The mouse live in the basement.
');
</cfscript>
A nice data dump to verify.
<cfdump var="#data#">
And a pair of loops.
<cfloop list="cat,dog,basement,parrot,whale,dolphins" index="i" >
<cfset count = 0>
<cfloop query="data" >
<cfif ListFind(REReplaceNoCase(TEXTDATA, '[^a-z0-9]', ' ', 'all'), variables.i, " ")>
<cfset count++>
</cfif>
</cfloop>
<cfoutput>#variables.i#=#variables.count#</cfoutput><br>
</cfloop>
Is it the most efficient? Not sure. I just wrapped another CFLOOP and looped it 10000 times:
{ts '2016-06-20 10:21:35'} {ts '2016-06-20 10:21:44'}
Nine seconds.
Upvotes: 1
Reputation: 20794
Try nested loops.
<cfset wordCount = ArrayNew(2)>
<cfset arrayRow = 0>
<cfset words = "cat,dog,etc">
<cfloop list = "#words# index = "thisWord">
<cfset arrayRow ++>
<cfset wordCount[arrayRow][1] = thisWord>
<cfset wordCount[arrayRow[2] = 0>
<cfloop query = "yourQuery">
<cfif listfind(textdata, thisWord, " "> <!--- space delimited --->
<cfset wordCount[arrayRow[2] += 1>
closing tags
Upvotes: 1
Reputation: 33935
SELECT x.keyword
, SUM(textdata RLIKE CONCAT('[[:<:]]',CAST(x.keyword AS CHAR CHARACTER SET utf8),'[[:>:]]')) cnt FROM my_table
JOIN (
SELECT 'cat' keyword
UNION
SELECT 'dog'
UNION
SELECT 'basement'
UNION
SELECT 'parrot'
UNION
SELECT 'whale'
UNION
SELECT 'dolphins'
) x
GROUP BY keyword;
Upvotes: 1