asorenson
asorenson

Reputation: 169

Translation dictionary - selecting subset of translations

I am implementing a translation dictionary with a structure based on this post: https://stackoverflow.com/a/16919801/4527140

The server DB is MySQL, and the client is SQLite (on Android), but for this particular issue, only the server (MySQL) comes into play.

Table structure:

CREATE TABLE word (
  _id INT NOT NULL AUTO_INCREMENT,
  language_id INT NOT NULL,
  word VARCHAR(255) NOT NULL,
  update_time DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP,
  PRIMARY KEY (_id) ,
  UNIQUE INDEX ui_word_lang_idx (language_id ASC, word ASC) ,
  CONSTRAINT fk_word_language
    FOREIGN KEY (language_id)
    REFERENCES language (_id))

CREATE TABLE IF NOT EXISTS translation (
  _id INT NOT NULL AUTO_INCREMENT,
  word_id1 INT NOT NULL,
  word_id2 INT NOT NULL,
  update_time DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP,
  PRIMARY KEY (_id) ,
  UNIQUE INDEX ui_word1_word2_idx (word_id1 ASC, word_id2 ASC) ,
  UNIQUE INDEX ui_word2_word1_idx (word_id2 ASC, word_id1 ASC) ,
  CONSTRAINT fk_translation_word1
    FOREIGN KEY (word_id1)
    REFERENCES word (_id),
  CONSTRAINT fk_translation_word2
    FOREIGN KEY (word_id2)
    REFERENCES word (_id))

Data:

language
_id | language | update_time
----------------------------
1   | English  | 2000-01-01 01:00:00
2   | French   | 2000-01-01 01:00:00
3   | Spanish  | 2000-01-01 01:00:00
4   | Turkish  | 2000-01-01 01:00:00

word
_id | language_id | word    | update_time
-----------------------------------------
1   | 1           | LOVE    | 2000-01-01 01:00:00
2   | 1           | HATE    | 2000-01-01 01:00:00
3   | 2           | AMOUR   | 2000-01-01 01:00:00
4   | 2           | HAINE   | 2000-01-01 01:00:00
5   | 1           | LUST    | 2000-01-01 01:00:00
6   | 3           | LUJURIA | 2000-01-01 01:00:00

translation
_id | word_id1 | word_id2 | update_time
---------------------------------------
1   | 1        | 3        | 2000-01-01 01:00:00
2   | 2        | 4        | 2000-01-01 01:00:00
4   | 5        | 6        | 2000-01-01 01:00:00

The client app will only download from the server DB the words and translations for the languages they have installed, and will download periodic updates, based on a timestamp on the table. I am trying to figure out a SELECT query that will get the words for both the base language (in this case, English is always installed and has language_id =1) and the other installed languages, but only the words for which there is a translation mapping between the installed languages. Thus, I do not wish to select words for which there is no mapping in the installed set of languages.

With the above data, if I have only installed French, I would only select and download:

word
_id | language_id | word    | update_time
-----------------------------------------
1   | 1           | LOVE    | 2000-01-01 01:00:00
2   | 1           | HATE    | 2000-01-01 01:00:00
3   | 2           | AMOUR   | 2000-01-01 01:00:00
4   | 2           | HAINE   | 2000-01-01 01:00:00

translation
_id | word_id1 | word_id2 | update_time
---------------------------------------
1   | 1        | 3        | 2000-01-01 01:00:00
2   | 2        | 4        | 2000-01-01 01:00:00

Upvotes: 1

Views: 224

Answers (1)

Juan Carlos Oropeza
Juan Carlos Oropeza

Reputation: 48197

Lets remove the date variable for now.

SQL FIDDLE DEMO

select distinct w1.word_id, w1.lenguaje_id, w1.word
from 
    word w1 inner join 
    translation t on (w1.word_id = t.word_id1 or w1.word_id = t.word_id2) inner join
    word w2 on ((t.word_id1 = w2.word_id or t.word_id2 = w2.word_id) 
                 and w2.word_id <> w1.word_id)
where 
    (w1.lenguaje_id = 1 or w2.lenguaje_id = 1)
and (w1.lenguaje_id in (2,3) or w2.lenguaje_id in (2,3))
  • First Join: The word is either on left or right side in translation.
  • Second Join: Same as above but also make sure isn't the same word.
  • First Where: Make sure any of the 2 word is english
    But if you only have english->other translation you dont need this one because is always true.
  • Second Where: Make sure the other word is an leguaje installed by user <> to english.
    Same as above if english->other you should design first id always english and simplify the OR

My sample:

  • English have 6 sins (not Sloth)
  • Spanish have 5 sins (have Sloth)
  • French have 4 sins (have Sloth)
  • Spanish French both have Wrath

In this case you dont get Spanish to French sloth.
If you change the select distinct to select * in the demo you could see a litle more what is happening.

This is only the word select. You will also need a select for translation. But that is a new requirement and I'm going to bed now, but should be easy once you have this one.

Upvotes: 2

Related Questions