user4432226
user4432226

Reputation:

Somebody knows about Wiktionary XML file structure?

I'm going to parse a Wiktionary file in many languages (English, Japanese, etc). From here (Parse Wiktionary XML data dump into MySQL database using PHP) I see the basic structure of it. But my question is that what these elements stand for?

For example, I think the title under page element is a word in the vocabulary. But where is its translation in other languages? Where are its synonyms?

Upvotes: 3

Views: 1166

Answers (1)

Andrew Krizhanovsky
Andrew Krizhanovsky

Reputation: 610

"...translation in other languages? Where are its synonyms?"

There are three bad news for you.

  1. All this information (translations, synonyms) are a plain text of the Wiktionary article.

  2. Different Wiktionaries have different structure of the dictionary article. For example, compare the structure of the article in the English Wiktionary and in the Russian Wiktionary.

  3. The structure of Wiktionary article is not presented in the XML-file, it is just a simple plain text, see item 1. Thus you need to parse this text in order to extract synonyms or translation.

You are welcome to read my paper about transforming (parsing) texts of Wiktionary articles to machine-readable database: http://arxiv.org/abs/1011.1368

Upvotes: 3

Related Questions