Reputation: 11
I am using a sentence-transformers model to make embeddings of image files (PIL ImageFile). However, it gives the error in the title. I tried a bunch of things to try to solve it, but to no avail.
I know it has to do with the size of the tensor, so I tried truncating it and I did some research but could not find a way to truncate without changing the code considerably. I think there may be a simple solution, but I can't find it.
The code analyzes a folder and (should) return the embeddings of the images in there.
import pandas as pd
from sentence_transformers import SentenceTransformer
import os
import numpy as np
from PIL import Image, ImageFile
ImageFile.LOAD_TRUNCATED_IMAGES = True
image_files = ['.jpg', '.jpeg', '.png']
class Analyzer:
def __init__(self):
self.image_model = SentenceTransformer("clip-ViT-B-32")
def analyze_directory(self, path):
files_data = []
with os.scandir(path) as dir_iter:
for entry in dir_iter:
try:
if entry.is_file():
_, ext = os.path.splitext(entry.name)
if ext in image_files:
try:
with Image.open(os.path.join(path, entry.name)) as img:
img.convert("RGB")
file_data = {
"Path": entry.name,
"Content": img,
"Type": "image"
}
except Exception as e:
file_data = {
"Path": entry.name,
"Content": "",
"Type": "image"
}
else:
file_data = {
"Path": entry.name,
"Content": "",
"Type": "unknown"
}
files_data.append(file_data)
except Exception as e:
continue
df = pd.DataFrame(files_data)
embeddings = []
for _, row in df.iterrows():
if row["Type"] == "image":
try:
img = img.resize((224, 224))
# Convert PIL Image to tensor
img_tensor = np.array(img)
# Normalize pixel values to [-1, 1] range expected by CLIP
img_normalized = (img_tensor / 255.0 * 2.0) - 1.0
img_batch = np.expand_dims(img_normalized, axis=0)
embedding = self.image_model.encode(str(img_batch)).numpy()[0]
except Exception as e:
raise RuntimeError(f"Failed to generate image embeddings: {str(e)}")
else:
# Handle unknown types
embedding = np.zeros(384)
embeddings.append(embedding)
embeddings = np.array(embeddings)
return embeddings
I tried truncating the tensor, but I couldn't find a way to do it.
I thought that just pre-processing the images would solve it, but it didn't
Error message:
Traceback (most recent call last):
File "<frozen runpy>", line 198, in _run_module_as_main
File "<frozen runpy>", line 88, in _run_code
File "C:\Users\...\src\document_analyzer\main.py", line 15, in <module>
main()
~~~~^^
File "C:\Users\...\src\document_analyzer\main.py", line 6, in main
folder_structure = analyzer.analyze_directory(path)
File "C:\Users\...\src\document_analyzer\analyzer.py", line 69, in analyze_directory
raise RuntimeError(f"Failed to generate image embeddings: {str(e)}")
RuntimeError: Failed to generate image embeddings: The size of tensor a (1203) must match the size of tensor b (77) at non-singleton dimension 1
Upvotes: 1
Views: 36