jefferyk
jefferyk

Reputation: 33

Deep learning - use both images and their description

I am going to make a classifier that can categorize images. I know that I should use convolutional neural network for this. The thing is that for every image I have a discription. Is there any way that I can use this description to improve the classifier?

Upvotes: 3

Views: 755

Answers (2)

Prophecies
Prophecies

Reputation: 723

The easiest thing to do is use both image features (CNN) and text feature (in form of LSTM language model, Bag-of-words, or off-the-shelf encoders like skip-thought vectors) and train the network to make the predictions about the image class the usual way. The two features can be combined by concatenation, element-wise multiplication, element-wise sum or outer-product. Take a look at recent progress in visual question answering (VQA), what you're describing sounds like a subset of what could be done with VQA.

Upvotes: 1

Thomas Pinetz
Thomas Pinetz

Reputation: 7148

Sure Neural networks have been used on Text like in https://arxiv.org/pdf/1609.08144v2.pdf. You only want to output classes and not sentences so you have an easier time then they have. To combine the classifier you could use a weighted rank sum on the outputs.

How much the classifier improves sounds very interesting to me and could be the basis for a publication.

Upvotes: 0

Related Questions