Michal
Michal

Reputation: 25

Word anagrammer with sql's %

I need to create a script searching for words with 'blanks', which basically are % in sql.

$numberofblanks = 1; //max 13
$searchedword = "WORD";
$searchedwordsorted = "DORW";

Results given should be:
WORDY WORLD CROWD SWORD WORDS DOWRY ROWED DROWN DOWER ROWDY

%word, w%ord, wo%rd, wor%d, word% would do, but what about more complicated queries, with 2 or more blanks?
Also was wondering if $searchedwordsorted is any helpful or it doesn't really matter and it's just waste of space in my table.

Thank you kindly for your help guys.
.mike

Upvotes: 2

Views: 258

Answers (1)

Mark Byers
Mark Byers

Reputation: 838736

First I want to correct an error in your question. In your queries you mean _ not %. The % means any number of characters (zero or more). Use _ to mean exactly one character.

Now on to the solution... you don't actually need the sorted word stored in the database. You could just do this:

SELECT word
FROM dictionary
WHERE CHAR_LENGTH(word) = 6
AND word LIKE '%W%'
AND word LIKE '%O%'
AND word LIKE '%R%'
AND word LIKE '%D%'

If you have duplicate letters in your input, need to handle this correctly to ensure that all results contain all the duplicated letters. For example if the input is FOO__ you need to check that each word matches both %F% and %O%O%.

SELECT word
FROM dictionary
WHERE CHAR_LENGTH(word) = 5
AND word LIKE '%F%'
AND word LIKE '%O%O%'

Note that this approach will require a full scan of the table so it will not be particularly efficient. You could improve things slightly by storing the length of each word in a separate column and indexing that column.


If you have sortedword then you can improve performance by omitting the % between duplicated letters since you know that they will appear consecutively in sortedword. This could improve performance bceause it reduces the amount of backtracking required for failed matches.

SELECT word
FROM dictionary
WHERE CHAR_LENGTH(word) = 5
AND sortedword LIKE '%F%'
AND sortedword LIKE '%OO%'

Another approach that requires sortedword to be present is as follows:

SELECT word
FROM dictionary
WHERE CHAR_LENGTH(word) = 5
AND sortedword LIKE '%D%O%R%W%'

Again this requires a full scan of the table. Again, if you have repeated letters you don't need the % between them.

SELECT word
FROM dictionary
WHERE CHAR_LENGTH(word) = 5
AND sortedword LIKE '%F%OO%'

Upvotes: 3

Related Questions