How to calculate corpus blue score

Question

I have trained my text encoder BERT model. Now I want to calculate corpus_bleu score on prediction data. but I don't have any idea where to implement corpus_bleu score in my code.

It is a caption based model for this, I have trained two models one is image and another on is text model. after training both the model I loaded them like below

vision_encoder = keras.models.load_model("vision_encoder")
text_encoder = keras.models.load_model("text_encoder")

Reading image file

def read_image(image_path):
image_array = tf.image.decode_jpeg(tf.io.read_file(image_path), channels=3)
return tf.image.resize(image_array, (299, 299))

Converting caption to tensor

def read_text(caption):
return tf.convert_to_tensor(caption)

Loading whole dataframe with 3851 captions

j = df['findings'].astype(str)
j.shape
(3851,)

Making predictions of the above dataframe using trained text_encoder model.

text_embeddings = text_encoder.predict(
tf.data.Dataset.from_tensor_slices(j)
.map(read_text).batch(batch_size),
verbose=1,
)
print(f"Text embeddings shape: {text_embeddings.shape}.")
1926/1926 [==============================] - 25s 13ms/step
Text embeddings shape: (3851, 128, 256).

code for finding top matches of predicted captions with relevant image

def find_matches(t_embeddings, queries, k=9, normalize=True):

image_array = tf.image.decode_jpeg(tf.io.read_file(image_path), channels=3)
imgr = tf.expand_dims(image_array, axis=0)
i_embedding = vision_encoder(tf.image.resize(imgr, (299, 299)))
# Normalize the query and the image embeddings.
if normalize:
    image_embeddings = tf.math.l2_normalize(t_embeddings, axis=1)
    query_embedding = tf.math.l2_normalize(i_embedding, axis=1)
# Compute the dot product between the query and the image embeddings.
dot_similarity = tf.matmul(query_embedding, image_embeddings, transpose_b=True)
print(dot_similarity.shape)
# Retrieve top k indices.
results = tf.math.top_k(dot_similarity, k).indices.numpy()
print(results.shape)
# Return matching image paths.
return [[df['findings'][idx] for idx in indices] for indices in results]

applying the above function to find relevant captions with an image

img = "/content/image.png"
matches = find_matches(t_embeddings, 
                   [img], 
                   normalize=True)[1]

for i in range(9):
print((matches[i]))

How to calculate corpus blue score

Answers (1)

Related Questions