Reputation: 2187
I wish to produce METEOR scores for several Japanese strings. I have imported nltk
, wordnet
and omw
but the results do not convince me it is working correctly.
from nltk.corpus import wordnet
from nltk.translate.meteor_score import single_meteor_score
nltk.download('wordnet')
nltk.download('omw')
reference = "チップは含まれていません。"
hypothesis = "チップは含まれていません。"
print(single_meteor_score(reference, hypothesis))
This outputs 0.5
but surely it should be much closer to 1.0
given the reference and hypothesis are identical?
Do I somehow need to specify which wordnet language I want to use in the call to single_meteor_score()
for example:
single_meteor_score(reference, hypothesis, wordnet=wordnetJapanese
.
Upvotes: 0
Views: 471
Reputation: 2187
Pending review by a qualified linguist, I appear to have found a solution. I found an open source tokenizer for Japanese. I pre-processed all of my reference and hypothesis strings to insert spaces between Japanese tokens and then run the nltk.single_meteor_score()
over the files.
Upvotes: 0