Vinicius Tavares
Vinicius Tavares

Reputation: 653

Graph Database - How to deal with multilingual data

I'm trying to approach a multilingual graph database but I'm struggling on how to achieve an optimal model.

My current proposal is to make two node types: Movie and MovieTranslation.

Movie holds all relationships as likes, related, ratings and comments. MovieTranslation contains all translatable data (title, plot, genres). A Movie node does not contain these kind of properties, only the original_title.

Movie and MovieTranslation are tied together by a translation relationship.

When I query nodes, I would check if they have a translation relationship with the queried locale (en_US for example). If true, merge the translation with the main node as the result.

I think this way might not be the best, but I can't think on a better one.

Do you guys have a better suggestion for the database model? It would be very appreciated.

I'm using neo4j, if you need this information.

Thanks, Vinicius.

Upvotes: 6

Views: 1482

Answers (2)

Stephen Cremin
Stephen Cremin

Reputation: 31

I suggest moving the original title to its own node also, call it MovieTitle. "Complicating" your model in this way should actually "simplify" (or at least standardise) your queries because you're always looking in one place for film titles (also for indexing and searching).

You're assuming that films only have one original title which isn't the case. A Korea-Japan co-production will have at least two original titles. Whole genres of Japanese cinema were released with different original Japanese titles in cinemas and on VHS.

Distinct from the idea of an original title is that of specific language titles. The same film released in different Chinese-speaking countries will have different Chinese-language titles that are deemed more marketable to the specific local audiences.

To get the original title:
MATCH (c:Country)<-[HAS_NATIONALITY]-(m:Movie)-[HAS_TITLE]->(t:MovieTitle)-[HAS_NATIONALITY]->(c:Country) WHERE m.id = 1 RETURN COLLECT(t.title, c.country_code)

To get the original title in China:
MATCH (m:Movie)-[HAS_TITLE]->(t:MovieTitle)-[HAS_NATIONALITY]->(c:Country) WHERE c.country_code == "CN" RETURN m, COLLECT(t.title, c.country_code)

To get all language titles:
MATCH (m:Movie)-[HAS_TITLE]->(t:MovieTitle)-[HAS_NATIONALITY]->(c:Country)-[HAS_LANGUAGE]->(l:Language) RETURN m, COLLECT(t.title, l.language_code)

To get all Chinese-language titles:
MATCH (m:Movie)-[HAS_TITLE]->(t:MovieTitle)-[HAS_NATIONALITY]->(c:Country)-[HAS_LANGUAGE]->(l:Language) WHERE l.language_code == "zh" RETURN m, COLLECT(t.title, c.name)

I would separate plot and genre into their own nodes. There is an argument that different national cinemas have unique genres, but if westerns and samurai dramas are both sub-genres of period dramas then you want to find them both on a period drama search.

I would still have the idea of Translation nodes but don't confuse with them the domain you're modelling. It should be domain-ignorant and - for simple words/phrases like "romantic comedy" - should almost be a third-party graph plug-in released by GraphAware in 2025.

Get the French-language genre titles of a specific film:
MATCH (m:Movie)-[HAS_GENRE*]->(g:Genre)-[HAS_TRANSLATION]->(t:Translation)-[HAS_LANGUAGE]->(l:Language) WHERE m.id = 100 AND l.language_code = "fr" RETURN COLLECT(t.translation)

Get all romanic comedies:
MATCH (m:Movie)-[HAS_GENRE*]->(g:Genre)-[HAS_TRANSLATION]->(t:Translation) WHERE t.translation = "comédie romantique" RETURN m

Unlike movie titles and genres, plots are altogether more simple because you're modelling the film's story as a blob of text and not as domain objects in itself. Perhaps later you may do textual analysis on the plot texts to find themes, gender bias, etc, and model this in the graph as well.

Get the French language plot for a specific movie:
MATCH (m:Movie)-[HAS_PLOT]->(p:Plot)-[HAS_LANGUAGE]->(l:Language)-[HAS_TRANSLATION]->(t:Translation) WHERE m.id = 100 AND t.translation = "French" RETURN p.plot

(Please treat the Cypher queries as pseudo-code. I didn't make a graph and test them.)

Upvotes: 3

Michael Hunger
Michael Hunger

Reputation: 41676

I think the model is ok.

You can RETURN movie, translation or RETURN {movie:movie, translation:translation}

Currently converting nodes to maps and combining these maps is not yet supported, that's something on the roadmap.

How and where would you want to use the nodes? If for rendering, you can just access the two columns or entries. If for graph visualization you can also combine them into a node in the json source for the viz.

Upvotes: 1

Related Questions