Reputation: 2441
Objective: Digit recognition by using Neural Networks
Description: images are normalized into 8 x 13 pixels
. For each row ever black pixel is represented by 1
and every white white 0
. Every image is thus represented by a vector of vectors as follows:
Problem: is it possible to use a vector of vectors
in Neural Networks? If not how should can the image be represented?
1
vector?Upvotes: 2
Views: 3657
Reputation: 1213
1 ) Yes combine into one vector is suitable i use this way http://vimeo.com/52775200
2) No it is not suitable because after normalization from rang ( 0-255 ) -> to range ( 0 - 1 ) differt rows gives aprox same values so lose data
Upvotes: 1
Reputation: 10995
To use multidimensional input, you'd need multidimensional neurons (which I suppose your formalism doesn't support). Sadly you didn't give any info on your network structure, which i think is your main source of problems an confusion. Whenever you evaluate a feature representation, you need to know how the input layer will be structured: If it's impractical, you probably need a different representation.
Your multidimensional vector: A network that accepts 1 image as input has only 1 (!) input node containing multiple vectors (of rows, respectively). This is the worst possible representation of your data. If we:
Think about all 3 approaches and what it does to your data. The latter approach is almost always as bad as the first approach. Neural networks work best with features. Features are not restructurings of the pixels (your row vectors). They should be META-data you can gain from the pixels: Brightness, locations where we go from back to white, bounding boxes, edges, shapes, masses of gravity, ... there's tons of stuff that can be chosen as features in image processing. You have to think about your problem and choose one (or more).
In the end, when you ask about how to "combine rows into 1 vector": You're just rephrasing "finding a feature vector for the whole image". You definitely don't want to "concatenate" your vectors and feed raw data into the network, you need to find information before you use the network. This is critical for pre-processing.
For further information on which features might be viable for OCR, just read into some papers. The most successful network atm is Convolutional Neural Network. A starting point for the topic feature extraction is here.
Upvotes: 1
Reputation: 39390
Combining them into one vector simply by concatenation is certainly possible. In fact, you should notice that arbitrary reordering of the data doesn't change the results, as long as it's consistent between training and classification.
As to your second approach, I think (I am really not sure) you might lose some information that way.
Upvotes: 2