Reputation: 1
I would like to generate a 1D vector of features extracted from RGB images (256 x 256 x 3), using pre-trained models. Suppose I started from a tensor whose shape is (N_images, 256, 256, 3) I would like to obtain a tensor whose shape is (N_images, M_features), where M_features is the number of features, chose by the user. I found a feasible solution in the keras/tensorflow documentation (see: “Extract features with VGG16”) and I try the following code (using ResNet50):
from tensorflow.keras.applications.resnet50 import ResNet50
from tensorflow.keras.preprocessing import image
from tensorflow.keras.applications.resnet50 import preprocess_input, decode_predictions
N_images= img_data.shape[0]
model = ResNet50(weights='imagenet', include_top=False, input_shape=(256,256,3))
model.summary()
img_data = preprocess_input(img_data)
res_feature = model.predict(img_data)
res_feature.shape
However, the shape of the feature set is (N_images, 8 ,8 ,2048). Therefore, I added a GlobalAveragePooling2D layer:
from tensorflow.keras.applications.resnet50 import ResNet50
from tensorflow.keras.layers import Dense, GlobalAveragePooling2D
from tensorflow.keras.preprocessing import image
from tensorflow.keras.applications.resnet50 import preprocess_input, decode_predictions
from tensorflow.keras import Model, Input, regularizers
N_images= img_data.shape[0]
base_model = ResNet50(weights='imagenet', include_top=False, input_shape=(256,256,3))**strong text**
x = base_model.output
x = GlobalAveragePooling2D() (x)
model = Model(inputs=base_model.input, outputs=x)
img_data = preprocess_input(img_data)
res_feature = model.predict(img_data)
res_feature.shape
In this case, the shape of the output tensor is (N_images, 2048), that could be ok but I would like to chose a specific number of desired features. Thanks in advance.
Upvotes: 0
Views: 160
Reputation: 12949
You probably want an autoencoder, which is basically trying to do dimensionality reduction of a convex latent space... it's simpler than what it seems (given a desired dimension M
):
res_feature = model.predict(img_data)
res_feature = np.reshape(res_feature, (len(res_feature), -1))
input = tf.keras.layers.Input(shape=res_feature.shape[1:])
encoder = tf.keras.layers.Dense(M, activation="selu")(input)
decoder = tf.keras.layers.Dense(shape=res_feature.shape[1:], activation="linear")
model = tf.keras.models.Model(inputs=input, outputs=decoder)
Upvotes: 1