Reputation: 73
I have a database table with words from a dictionary.
Now I want to select words for an anagram. For example if I give the string SEPIAN
it should fetch values like apes
, pain
, pains
, pies
, pines
, sepia
, etc.
For this I used the query
SELECT * FROM words WHERE word REGEXP '^[SEPIAN]{1,6}$'
But this query returns words like anna
, essen
which have repeated characters not in the supplied string. Eg. anna
has two n
's but there is only one n
in the search string SEPIAN
.
How can I write my regular expression to achieve this? Also if there are repeated characters in my search string at that time the repeated characters should reflect in the result.
Upvotes: 7
Views: 727
Reputation: 6718
I guess something like this will help you. Table words
:
| id | word | alfagram |
---------------------------------
| 1 | karabar | aaabkrr |
| 2 | malabar | aaablmr |
| 3 | trantantan| aaannnrttt|
alfagram
here is letters of a word in an alphabetical order.
PHP code:
$searchString = 'abrakadabra';
$searchStringAlfa = array();
for( $i=0,$c=strlen($searchString);$i<$c;$i++ ){
if( isset($searchStringAlfa[$searchString[$i]]) ){
$searchStringAlfa[$searchString[$i]]++;
}else{
$searchStringAlfa[$searchString[$i]] = 1;
}
}
ksort($searchStringAlfa);
$regexp = '^';
foreach( $searchStringAlfa as $alfa=>$amount ){
$regexp .= '['.$alfa.']{0,'.$amount.'}';
}
$regexp .= '$';
$searchString
is the string you wish to search with. Then the only thing you should do is execute the query:
$result = mysql_query('SELECT * FROM words WHERE alfagram REGEXP "'.$regexp.'"');
May be some additional checks and optimisations are needed
Upvotes: 2
Reputation: 8486
Since MySQL does not support back-referencing capturing groups, the typical solution of (\w).*\1
will not work. This means that any solution given will need to enumerate all possible doubles. Furthermore, as far as I can tell back-references are not valid in look-aheads or look-behinds, and look-aheads and look-behinds are not supported in MySQL.
However, you can split this into two expressions, and use the following query:
SELECT * FROM words
WHERE word REGEXP '^[SEPIAN]{1,6}$'
AND NOT word REGEXP 'S.*?S|E.*?E|P.*?P|I.*?I|A.*?A|N.*?N'
Not very pretty, but it works and it should be fairly efficient as well.
To support a set limit of repeated characters, use the following pattern for your secondary expression:
A(.*?A){X,}
Where A
is your character and X
is the number of times it's allowed.
So if you're adding another N
to your string SEPIANN
(for a total of 2 N
s), your query would become:
SELECT * FROM words
WHERE word REGEXP '^[SEPIAN]{1,7}$'
AND NOT word REGEXP 'S.*?S|E.*?E|P.*?P|I.*?I|A.*?A|N(.*?N){2}'
Upvotes: 5