Reputation:
I have trained my text encoder BERT model. Now I want to calculate corpus_bleu score on prediction data. but I don't have any idea where to implement corpus_bleu
score in my code.
It is a caption based model for this, I have trained two models one is image and another on is text model. after training both the model I loaded them like below
vision_encoder = keras.models.load_model("vision_encoder")
text_encoder = keras.models.load_model("text_encoder")
Reading image file
def read_image(image_path):
image_array = tf.image.decode_jpeg(tf.io.read_file(image_path), channels=3)
return tf.image.resize(image_array, (299, 299))
Converting caption to tensor
def read_text(caption):
return tf.convert_to_tensor(caption)
Loading whole dataframe with 3851 captions
j = df['findings'].astype(str)
j.shape
(3851,)
Making predictions of the above dataframe using trained text_encoder
model.
text_embeddings = text_encoder.predict(
tf.data.Dataset.from_tensor_slices(j)
.map(read_text).batch(batch_size),
verbose=1,
)
print(f"Text embeddings shape: {text_embeddings.shape}.")
1926/1926 [==============================] - 25s 13ms/step
Text embeddings shape: (3851, 128, 256).
code for finding top matches of predicted captions with relevant image
def find_matches(t_embeddings, queries, k=9, normalize=True):
image_array = tf.image.decode_jpeg(tf.io.read_file(image_path), channels=3)
imgr = tf.expand_dims(image_array, axis=0)
i_embedding = vision_encoder(tf.image.resize(imgr, (299, 299)))
# Normalize the query and the image embeddings.
if normalize:
image_embeddings = tf.math.l2_normalize(t_embeddings, axis=1)
query_embedding = tf.math.l2_normalize(i_embedding, axis=1)
# Compute the dot product between the query and the image embeddings.
dot_similarity = tf.matmul(query_embedding, image_embeddings, transpose_b=True)
print(dot_similarity.shape)
# Retrieve top k indices.
results = tf.math.top_k(dot_similarity, k).indices.numpy()
print(results.shape)
# Return matching image paths.
return [[df['findings'][idx] for idx in indices] for indices in results]
applying the above function to find relevant captions with an image
img = "/content/image.png"
matches = find_matches(t_embeddings,
[img],
normalize=True)[1]
for i in range(9):
print((matches[i]))
Upvotes: 0
Views: 429
Reputation: 2056
By definition
BLEU (BiLingual Evaluation Understudy) is a metric for automatically evaluating machine-translated text. The BLEU score is a number between zero and one that measures the similarity of the machine-translated text to a set of high quality reference translations.
And it can be measured using a generated text and at least one reference text:
from nltk.translate.bleu_score import sentence_bleu
reference = [
'this is a dog'.split(),
'it is dog'.split(),
'dog it is'.split(),
'a dog, it is'.split()
]
candidate = 'it is dog'.split()
print('BLEU score -> {}'.format(sentence_bleu(reference, candidate )))
candidate = 'it is a dog'.split()
print('BLEU score -> {}'.format(sentence_bleu(reference, candidate)))
Keep in mind BLEU score has nothing to do with the model itself and the process of text generation. It only comes to play after the text was generated.
There is another option for evaluating a language model on a corpus which I think better suits you here. It is called Perplexity
and can be briefly defined like:
In natural language processing, perplexity is a way of evaluating language models. A language model is a probability distribution over entire sentences or texts.
When it comes to time/processing power efficiency it can dominate the BLEU score. The only point to consider here is that the perplexity calculation is taking place just during generation process.
Upvotes: 0