Reputation: 13
I am trying to apply sentence_bleu to a column in Pandas to rate the quality of some machine translation. But the scores it is outputting are incorrect. Can anyone see my error?
import pandas as pd
from nltk.translate.bleu_score import sentence_bleu
translations = {
'reference': [['this', 'is', 'a', 'test'],['this', 'is', 'a', 'test'],['this', 'is', 'a', 'test']],
'candidate': [['this', 'is', 'a', 'test'],['this', 'is', 'not','a', 'quiz'],['I', 'like', 'kitties', '.']]
}
df = pd.DataFrame(translations)
df['BLEU'] = df.apply(lambda row: sentence_bleu(row['reference'],row['candidate']), axis=1)
df
It outputs this:
Index reference candidate BLEU
0 [this, is, a, test] [this, is, a, test] 1.288230e-231
1 [this, is, a, test] [this, is, not, a, quiz] 1.218332e-231
2 [this, is, a, test] [I, like, kitties, .] 0.000000e+00
Row 0 should be equal to 1.0 and row 1 should be less than 1.0. Probably around 0.9. What am I doing wrong?
Upvotes: 1
Views: 859
Reputation: 444
You are currently comparing the strings inside of the list. Since these strings only contain single words, the score will rate all n-grams with n > 1 directly as 0.
Instead you want your reference to be ['this is a test']
(a list of ground truth references), and the candidate to be 'this is a test'
(a single candidate).
from nltk.translate.bleu_score import sentence_bleu
translations = {
'reference': [['this is a test'],['this is a test'],['this is a test']],
'candidate': ['this is a test','this is not a test','I like kitties']
}
df = pd.DataFrame(translations)
df['BLEU'] = df.apply(lambda row: sentence_bleu(row['reference'],row['candidate']), axis=1)
df
Which results into:
reference candidate BLEU
0 [this is a test] this is a test 1.000000e+00
1 [this is a test] this is not a test 7.037906e-01
2 [this is a test] I like kitties 6.830097e-155
Upvotes: 1