Reputation: 1852
I have a very specific query I am currently doing with MySQL
My Table structure is :
id : INT, primary KeY, AUTO_INCREMENT
occupationID : INT
alias : VARCHAR(255)
Then, I do a
SELECT occupationID, (MATCH(alias) AGAINST ('Web Developer' IN NATURAL LANGUAGE MODE) / count(*)) as score FROM aliases group by occupationID order by score DESC LIMIT 0,2;
This query performs a search on every ROW, doing a full scan, and divide matches by their number of occurrences. This way, I got an average score on all rows, giving me the accuracy I need.
This is very slow (20 sec) , with a 50k records table . ( I am not surprised, MySQL fulltext is very slow...).
With Sphinx, I was thinking to build an index with this query:
select id,occupationID,alias, (SELECT count(*) from aliases AS A WHERE B.occupationID=A.occupationID) as nb from aliases AS B
And then do a
$sphinx->setSelect("@id, sum(@weight)/nb as score");
$sphinx->setGroupBy("occupationID", GROUP_BY_ATTR, "score DESC");
and
$sphinx->query("Web Developer");
Am I doing this right?
Upvotes: 0
Views: 1136
Reputation: 21091
Mostly. The only odity I spot, is in the original mysql, you group by occupationID. But in building the index you join with id - meaning the count will be the number with the same id, not the sample occupationID.
I'm also not certain that sum(@weight) works in sphinx - in theory it should, but there are a few odd restrictions, so cant be sure without testing.
Upvotes: 1