Reputation: 21
I'm experimenting with the CLIP model. I loaded a pretrained model and wanted to see how the embeddings look like at intermediate layers. The code I used is as below:
dataset = CelebADataset(root_dir="celeba/img_align_celeba", transform=preprocess)
dataloader = DataLoader(dataset, batch_size=32, shuffle=False)
device = "cuda" if torch.cuda.is_available() else "cpu"
model, _ = clip.load("ViT-B/32", device=device)
features = {}
def hook_fn(module, input, output):
features[module] = output[:, 0, :]
# model.patch_embed.register_forward_hook(hook_fn) # Don't register hook for patch_embed layer
for i, block in enumerate(model.visual.transformer.resblocks):
block.register_forward_hook(hook_fn) # Each block
all_features = {
# 'patch_embed': [],
'block_0': [],
'block_1': [],
'block_2': [],
'block_3': [],
'block_4': [],
'block_5': [],
'block_6': [],
'block_7': [],
'block_8': [],
'block_9': [],
'block_10': [],
'block_11': [],
'final': []
}
all_labels = []
with torch.no_grad():
for inputs in tqdm(dataloader):
inputs = inputs.to(device)
final_output = model.encode_image(inputs)
# Convert features to numpy and store
for i in range(12):
all_features[f'block_{i}'].append(features[model.visual.transformer.resblocks[i]].cpu().numpy())
all_features['final'].append(final_output.cpu().numpy())
# all_labels.append(labels.cpu().numpy())
for key in all_features:
all_features[key] = np.concatenate(all_features[key], axis=0)
# all_labels = np.concatenate(all_labels, axis=0)
np.save("celeba_block0.npy", all_features[f'block_{0}'])
np.save("celeba_block1.npy", all_features[f'block_{1}'])
...
I had done similar things with Dino before, and after doing dimension reduction on Dino's embeddings, I can see images in different label groups form distinct clusters. However when I checked the embeddings from CLIP, I didn't see clear clusters except from the final embeddings.
Is this because CLIP's network structure is different from Dino or my code is wrong?
I tried to look at the structure of CLIP but I couldn't figure out an explanation to this.
Upvotes: 0
Views: 68