Reputation: 23
As far as I know, pre-trained models play well in many tasks as a feature-extractor, thanks to their abundant training dataset.
However, I'm wondering that whether the model, let's say vgg-16
,
have certain ability to extract some "semantic" information from input image?
If the answer is positive, given an unlabeled dataset
,
is it possible to "cluster" images by measuring the semantic similarities of the extracted features?
Actually, I've spent some efforts:
X
, of size(5000, 3, 224, 224). features = vgg.features(X).view(X.shape[0], -1) # X: (5000, 3, 224, 224)
features = vgg.classifier(features) # features: (5000, 25088)
return features # features: (5000, 4096)
cosine similarity
, inner product
, torch.cdist
, however, only to find several bad clusters.Any suggestion? Thanks in advance.
Upvotes: 0
Views: 697
Reputation: 40738
You might not want to go all the way to the last layer, as these contain features specific to the classification task at hand. Using features from layers higher up in the classifier might help. Additionally, you want to switch to eval mode since VGG-16 has a dropout layer in its classifier.
>>> vgg16 = torchvision.models.vgg(pretrained=True).eval()
Truncate the classifier:
>>> vgg16.classifier = vgg16.classifier[:4]
Now vgg16
's classifier will look like:
(classifier): Sequential(
(0): Linear(in_features=25088, out_features=4096, bias=True)
(1): ReLU(inplace=True)
(2): Dropout(p=0.5, inplace=False)
(3): Linear(in_features=4096, out_features=4096, bias=True)
)
Then extract the features:
>>> vgg16(torch.rand(1, 3, 124, 124)).shape
torch.Size([1, 4096])
Upvotes: 1