user3871
user3871

Reputation: 12718

Storing language dictionaries in database

I'm creating a language app that currently only features Mandarin Chinese and Spanish.

Currently, I have self-created dictionary simply loaded as JSON without storing in the DB, but I've found full downloadable dictionaries, such as CEDICT for Chinese to do the definitions for me. That being said, this file is 115k rows long, with 6 columns per row.

I also need to do this for Spanish, and then every other language I plan on including.

Notes:

That being said, what's the best way to store this data?

I'm assuming as separate tables, dictionary_zh, dictionary_es, but I could also store each dictionary in a dictionary table, with an added column for locale and query based on that. This SO answer states that 1m records isn't "too much" for a table to handle, it simply defines on how you index the table.


Btw, anyone have a recommendation for a good downloadable Spanish - English dictionary?


Note: I'm downloading the dictionary and cutting it up into something I can load into a CSV

Traditional Simplified  Pinyin  Meaning       Level Quest
佟               佟       Tong2   surname Tong    1     2
...

I'm translating it by simply passing in the identifying character, in this case, and grabbing its Meaning.

Upvotes: 1

Views: 524

Answers (1)

user591272
user591272

Reputation:

I would store each dictionary in a separate table to abstract how I fetch the definition for a word depending on the locale, without the need to know how a dictionary (mapped as Dictionary type in the diagram below) operates its translation. This is useful when you might have dictionaries which don't reside in your DB, such as ones translating via an API.

UML

The method translate() is implemented differently for each type of Dictionary (in your case ChineseDictionary or SpanishDictionary).

Another advantage of this approach from a data management point of view is that you will not have to make a lot of operations on the data when new versions of your dictionary are released, which makes it cheap to maintain.

Upvotes: 1

Related Questions