Reputation:
I created an anagram creating application, by creating an anagram field in my database, with a lower cased alphabetically stored string.
For example, suction becomes cinostu, ear becomes aer, and so on.
What I want to do now is create sub words from the original anagram searched for.
Example: How would you go about pulling the subset words from a search for 'arrest', i.e. 'rest' and 'stare'.
Upvotes: 3
Views: 1764
Reputation: 31
Andy,
I think you need to convert the ASCII code back into a character - you are indexing the array with letters, yet you are accessing it with ASCII values.
Here's your code, modified slightly:
$LetterCount = array("a" => 1, "b" => 1, "c" => 1, "d" => 1, "e" => 0, "f" => 1, "g" => 1, "h" => 1, "i" => 1, "j" => 1, "k" => 1, "l" => 1, "m" => 1, "n" => 1, "o" => 1, "p" => 1, "q" => 1, "r" => 1, "s" => 1, "t" => 1, "u" => 1, "v" => 1, "w" => 1, "x" => 1, "y" => 1, "z" => 1);
$AsciiCodeLowerCaseA = 97;
for ($j = **0**; $j < strlen($string); $j++) {
$CurrentLetter = $string[$j];
$AsciiCode = ord($CurrentLetter);
$AlphabetPos = **chr($AsciiCode - $AsciiCodeLowerCaseA + 1);**
$LetterCount[$AlphabetPos] = $LetterCount[$AlphabetPos] + 1;
}
Also I just noticed you're indexing the characters in the string from 1, but arrays are zero-idexed.
I think this might be much simpler as well (unless I'm missing something)
for($j = 0; $j < strlen($string); $j++) {
$LetterCount[$string[$j]]++;
}
Upvotes: 0
Reputation: 93
Here's an approach I've used before that makes use of your list of alphabetically-sorted words.
1) Take your target word (arrest) and sort it (aerrst).
2) Then from the sorted word generate new strings where each letter is either included or excluded. For a word of N letters this gives 2**N possible strings. (I don't know PHP but can give you pseudocode or eg Python if you would like.)
For your target word we have: a, e, r, r, s, t, st, rs, rt, rst, rr, rs, rt, rst, rrs, rrt, rrst, er, er, es, et, est, ers, ert, erst, err, ers, ert, erst, errs, errt, errst, ae, ar, ar, as, at, ast, ars, art, arst, arr, ars, art, arst, arrs, arrt, arrst, aer, aer, aes, aet, aest, aers, aert, aerst, aerr, aers, aert, aerst, aerrs, aerrt, aerrst
3) Then check these strings against your sorted list. Those that appear in your sorted list correspond to the subset words you want.
eg aerrst corresponds to full anagrams (arrest, rarest, raster, ...)
eg aerst will be in your sorted list (stare, tears, ...)
eg rrs will not be in your sorted list
Upvotes: 2
Reputation:
I haven't thought about this meaningfully yet, sorry (work to do!), but however you end up generating the words, don't forget that this will cache like a motherlover, so don't go re-generating these on the fly every time someone searches.
CS.
Upvotes: 1
Reputation:
Hey Bork. been trying to adapt your code into PHP, and I have the following:
$LetterCount = array("a" => 1, "b" => 1, "c" => 1, "d" => 1, "e" => 0, "f" => 1, "g" => 1, "h" => 1, "i" => 1, "j" => 1, "k" => 1, "l" => 1, "m" => 1, "n" => 1, "o" => 1, "p" => 1, "q" => 1, "r" => 1, "s" => 1, "t" => 1, "u" => 1, "v" => 1, "w" => 1, "x" => 1, "y" => 1, "z" => 1);
$AsciiCodeLowerCaseA = 97;
for ($j = 1; $j < strlen($string); $j++) {
$CurrentLetter = $string[$j];
$AsciiCode = ord($CurrentLetter);
$AlphabetPos = $AsciiCode - $AsciiCodeLowerCaseA + 1;
$LetterCount[$AlphabetPos] = $LetterCount[$AlphabetPos] + 1;
}
I hardcoded the array declaration bit to save time.
Anyway, It didnt seem to work and gave me this error: Notice: Undefined offset: 1
Here is a screenshot of the errors I am getting, I have also added echos for each var or array in the loop to see if you can understand what is going on.
http://i42.tinypic.com/11ryz4g.png
I think it isnt identifying the aplhabet letter in the array correctly, and thus is adding numbers incorrectly to the end of the array.
Let me know what you think I should do.
Upvotes: 0
Reputation: 3368
This approach is slightly different from yours, but I believe it will be easy to implement programmatically. I'm not sure it's optimal performance wise, but I'll leave that to you :-)
First you need a dictionary of all the legal words you want to be able to match.
Create a "Dictionary" or "Words" table in your database, with the first column storing the actual word, the second column storing the word converted all to upper or lower case for easy comparison, and then one integer column for every letter in the alphabet A-Z.
Import your dictionary file into this table, and programmatically count the number of times each letter of the alphabet appears in this word, and store this number in the column for that letter.
Example Word: bookkeeper
Store the word "bookkeeper" in the word column, 1 in your "b", "p", and "r" columns, 2 in your "o" and "k" columns, and 3 in your "e" column.
Once you have your entire dictionary imported with letter counts, you can fairly easily determine all the possible sub-words in a given word using the following method:
You could achieve this by making an in-memory array with 26 positions representing the alphabet
Example word: vehicle
SELECT Word FROM Dictionary WHERE NOT (
(a >= 1) OR (b >= 1) OR (c >= 2) ... OR (z >= 1)
)
Thus any word in your dictionary that has an 'a' or a 'z' in it are ruled out, since the query will filter out any words where the 'a' or 'z' count is at least one, and any word that uses more than one 'c' is filtered out.
You can easily generate all the "OR" conditions programmatically by using an array of 26 integers, all starting at 1, and then go through your word, adding 1 to the appropriate array value of each letter you find.
UPDATE - final count example code
Forgive my code sample below - it is going to be in ASP (VBScript) - but you should be able to grasp and translate to PHP, or have a kind person do this for you if not.
Const AsciiCodeLowerCaseA = 97
InputWord = "Carrots"
LowerCaseInputWord = LCase(InputWord)
Dim LetterCount(26)
for i = 1 to 26
LetterCount(i) = 1
next
for j = 1 to Len(InputWord)
CurrentLetter = Mid(InputWord, j, 1)
AsciiCode = Chr(CurrentLetter)
AlphabetPos = AsciiCode - AsciiCodeLowerCaseA + 1
LetterCount(AlphabetPos) = LetterCount(AlphabetPos) + 1
next
By converting each letter of the word to its ASCII value, then subtracting the ascii code for lower case 'a' and adding 1, you get the position of that letter in the alphabet from 1 to 26. You now add 1 to that position in the array.
It seems counterintuitive, but initialise all the letters to 1 in your array. When you build the SQL statement, you are eliminating all words with letter counts higher than your input word - so if a letter does not appear in your original word, you filter out words that have one or more of that letter. If the letter appears once, you filter out words that have two or more of that letter, and so on.
Upvotes: 0
Reputation: 402
Include a space on the end of the original word. Each iteration where the space ends up in the middle of letters, you'll get two words. Then you can test those two words. If the space is on the beginning or end of the iteration pattern, trim it off and test that one word.
Upvotes: 1