RuntimeError: Failed to generate image embeddings: The size of tensor a (1246) must match the size of tensor b (77) at non-singleton dimension 1

Question

I am using a sentence-transformers model to make embeddings of image files (PIL ImageFile). However, it gives the error in the title. I tried a bunch of things to try to solve it, but to no avail.

I know it has to do with the size of the tensor, so I tried truncating it and I did some research but could not find a way to truncate without changing the code considerably. I think there may be a simple solution, but I can't find it.

The code analyzes a folder and (should) return the embeddings of the images in there.

import pandas as pd
from sentence_transformers import SentenceTransformer
import os
import numpy as np
from PIL import Image, ImageFile

ImageFile.LOAD_TRUNCATED_IMAGES = True

image_files = ['.jpg', '.jpeg', '.png']

class Analyzer:

    def __init__(self):
        self.image_model = SentenceTransformer("clip-ViT-B-32")
        
    def analyze_directory(self, path):

        files_data = []
        
        with os.scandir(path) as dir_iter:
            for entry in dir_iter:
                try:
                    if entry.is_file():
                        _, ext = os.path.splitext(entry.name)
                        if ext in image_files:
                            try:
                                with Image.open(os.path.join(path, entry.name)) as img:
                                    img.convert("RGB")
                                    file_data = {
                                        "Path": entry.name,
                                        "Content": img,
                                        "Type": "image"
                                    }
                            except Exception as e:
                                file_data = {
                                    "Path": entry.name,
                                    "Content": "",
                                    "Type": "image"
                                }

                        else:
                            file_data = {
                                "Path": entry.name,
                                "Content": "",
                                "Type": "unknown"
                            }
                        
                    files_data.append(file_data)
                
                except Exception as e:
                    continue
        
        df = pd.DataFrame(files_data)

        embeddings = []
        for _, row in df.iterrows():
            if row["Type"] == "image":
                try:
                    img = img.resize((224, 224))
                    # Convert PIL Image to tensor
                    img_tensor = np.array(img)
                    # Normalize pixel values to [-1, 1] range expected by CLIP
                    img_normalized = (img_tensor / 255.0 * 2.0) - 1.0
                    img_batch = np.expand_dims(img_normalized, axis=0)
                    embedding = self.image_model.encode(str(img_batch)).numpy()[0]
                except Exception as e:
                    raise RuntimeError(f"Failed to generate image embeddings: {str(e)}")
            else:
                # Handle unknown types
                embedding = np.zeros(384)

            embeddings.append(embedding)
        
        embeddings = np.array(embeddings)
        
        return embeddings

I tried truncating the tensor, but I couldn't find a way to do it.

I thought that just pre-processing the images would solve it, but it didn't

Error message:

Traceback (most recent call last):
  File "", line 198, in _run_module_as_main
  File "", line 88, in _run_code
  File "C:\Users\...\src\document_analyzer\main.py", line 15, in 
    main()
    ~~~~^^
  File "C:\Users\...\src\document_analyzer\main.py", line 6, in main
    folder_structure = analyzer.analyze_directory(path)
  File "C:\Users\...\src\document_analyzer\analyzer.py", line 69, in analyze_directory
    raise RuntimeError(f"Failed to generate image embeddings: {str(e)}")
RuntimeError: Failed to generate image embeddings: The size of tensor a (1203) must match the size of tensor b (77) at non-singleton dimension 1

RuntimeError: Failed to generate image embeddings: The size of tensor a (1246) must match the size of tensor b (77) at non-singleton dimension 1

Answers (0)

Related Questions