Takahiro Fujiwara
Takahiro Fujiwara

Reputation: 31

In nltk wordnet, wn.synsets.definition(lang="lang") show enlish and japanese, but not other languages

wn.synsets.definition(lang="lang") show english and japanese result, but not other languages.

wn.synset('word').lemma_names shows the other languages too, though.

Do I need extra download? , there is the difference between languages?

enter image description here

enter image description here

the documents says that it do lazy download. so I tried a few times, but result didn't change.

Upvotes: 3

Views: 402

Answers (1)

ljdyer
ljdyer

Reputation: 2086

I played around a bit and the first thing I found out is that definitions are available for more languages than just English and Japanese. See the following table for definitions of a few words including your example word for all the languages available from wn.langs() after downloading nltk omw-1.4. 'dog' has definitions in 7 languages, 'house' in 9, and 'person' in 11.

Regarding the missing definitions for certain languages, I think the data just isn't present in the corresponding wordnets. The NLTK wordnet documentation states:

This module also allows you to find lemmas in languages other than English from the Open Multilingual Wordnet (https://omwn.org/)

If you go to https://omwn.org/ and follow the links for the respective wordnets, you'll find for example this page where you can search for words in a few languages. Searching 'casa' in Spanish, you'll find the definition reverts to the English definition for 'house', but for Italian there is a definition in Italian - which is consistent with the table below.

Hope this helps!

lang dog house person
eng a member of the genus Canis (probably descended from the common wolf) that has been domesticated by man since prehistoric times; occurs in many breeds a dwelling that serves as living quarters for one or more families a human being
als Ndërtesë për të banuar (zakonisht për një familje a për familje të një gjaku); banesë; apartament ku banon një familje. të qënurit njeri
arb
bul Вид домашно животно от семейство хищни бозайници, с различна големина, цвят на козината и различни породи, което лае и често се използва като пазач на дома и имота, за лов, може да бъде дресирано и обучавано за различни служебни цели. Сграда,помещение за постоянно живеене на отделно семейство или човек. Отделен човек, който със своите неповторими качества се отличава, различава от другите хора.
cmn
dan
ell σκύλος του γένους Canis familiaris που συνήθως προέρχεται από τον κοινό λύκο και έχει εξημερωθεί από τους προϊστορικούς χρόνους το τμήμα οικήματος (λ .χ. το διαμέρισμα πολυκατοικίας) στο οποίο διαμένει κανείς το έμβιο ον, κάθε άτομο, άνθρωπος ανεξαρτήτως φύλου και ηλικίας
fin
fra
heb מבנה המשמש כמקום מגורים למשפחה אחת או יותר מישהו דופק בדלת
hrv
isl
ita mammifero domestico dei canidi, molto comune, diffuso in tutto il mondo, con attitudini varie a seconda della razza edificio destinato ad abitazione entità umana considerata in quanto tale, senza caratterizzazioni di sesso, età, provenienza, ecc.
ita_iwn animale domestico molto comune, diffuso in tutto il mondo, usato per la caccia, la difesa, nella pastorizia, e come animale da compagnia essere distinto da ogni altro della medesima specie
jpn 有史以前から人間に家畜化されて来た(おそらく普通のオオカミを先祖とする)イヌ属の動物 1家族以上のための居住棟として機能する住居 一人の人間
cat
eus
glg
spa
ind seseorang yang dipandang tinggi
zsm
nld
nno
nob
pol
por
ron Animal mamifer carnivor domesticit, folosit pentru pază, vânătoare etc.. construcție destinată pentru a servi de locuință uneia sau mai multor familii Individ al speciei umane, om considerat prin totalitatea însușirilor sale fizice și psihice
lit
slk
slv
swe
tha
total 7 9 11

Code used to generate the above table (in Google Colab):

import nltk
from nltk.corpus import wordnet as wn
nltk.download('wordnet')
nltk.download('omw-1.4')

import pandas as pd
defs = pd.DataFrame()
for lang in wn.langs():
    for word in ['dog', 'house', 'person']:
        this_word = {}
        def_ = wn.synsets(word)[0].definition(lang=lang)
        defs.at[lang, word] = def_[0] if isinstance(def_, list) else def_
        defs[word] = defs[word].astype('object')
for word in defs.columns:
    defs_present = len([def_ for def_ in defs[word].to_list() if def_ != None])
    defs.at['total', word] = defs_present
defs

Upvotes: 2

Related Questions