How do I find a partial string matching the latter half of the word using mysql?

Question

Hello I'm creating a word rhymer for complete words but I'm stuck on the logic.

I imported the cmu dictionary text into two columns, word and arpabet. And this is what I have so far:

Code:
Select word, arpabet from cmudict 
where word LIKE "%dove" and substring_index(arpabet, ' ', -2) in
(Select substring_index(arpabet, ' ', -2) from cmudict);

Output (1 row returned):
word    arpabet
'DOVE', 'D AH1 V'

I'm looking for words whose arpabet matches the 'AH1 V' from within the last 2 spaced delimiters so it would give me a list of words that would sound like 'DOVE' while ignoring the stress number (ah1, ah2):

Output I'm looking for:

'STRANGELOVE', 'S T R EY1 N JH L AH2 V'
'LABOVE', 'L AH0 B AH1 V'
all other words similar sounding such as love, glove, etc...

I've tried changing the LIKE '%ove' but that produces a list of correct and incorrect matches with a similar suffix from the word column since it ignores the corresponding arpabet.

Any help would be great.

elcool · Accepted Answer

If you're trying to get words that match the sound, then I believe you should match with the column arpabet and not with word.

SELECT word, arpabet FROM cmudict 
WHERE REPLACE(REPLACE(REPLACE(substring_index(arpabet, ' ', -2),'0',''),'1',''),'2','') in
(SELECT REPLACE(REPLACE(REPLACE(substring_index(arpabet, ' ', -2),'0',''),'1',''),'2','') 
FROM cmudict WHERE word = "dove");

Breaking it down:

REPLACE(REPLACE(REPLACE( will eliminate the 0,1,2 from any of the sounds. Since you don't want to match with the numbers (stress). Looks kinda ugly, but it gets it done.
The substring_index(arpabet, ' ', -2) stays the same, but now it's inside the REPLACEs.
The subquery will get the arpabet without the numbers, of the word 'DOVE'. If you run this subquery alone, should return 'AH V'.

How do I find a partial string matching the latter half of the word using mysql?

Answers (1)

Related Questions