user3450211
user3450211

Reputation: 151

Neural Network Categorical Data Implementation

I've been learning to work with neural networks as a hobby project, but am at a complete loss with how to handle categorical data. I read the article http://visualstudiomagazine.com/articles/2013/07/01/neural-network-data-normalization-and-encoding.aspx, which explains normalization of the input data and explains how to preprocess categorical data using effects encoding. I understand the concept of breaking the categories into vectors, but have no idea how to actually implement this.

For example, if I'm using countries as categorical data (e.g. Finland, Thailand, etc), would I process the resulting vector into a single number to be fed to a single input, or would I have a separate input for each component of the vector? Under the latter, if there are 196 different countries, that would mean I would need 196 different inputs just to process this particular piece of data. If a lot of different categorical data is being fed to the network, I can see this becoming really unwieldy very fast.

Is there something I'm missing? How exactly is categorical data mapped to neuron inputs?

Upvotes: 15

Views: 11478

Answers (1)

jorgenkg
jorgenkg

Reputation: 4285

Neural network inputs

As a rule of thumb: different classes and categories should have their own input signals.


Why you can't encode it with a single input

Since a neural network acts upon the input values through activation functions, a higher input value will result in a higher activation input.

A higher input value will make the neuron more likely to fire.

As long as you don't want to tell the network that Thailand is "better" than Finland then you may not encode the country input signal as InputValue(Finland) = 24, InputValue(Thailand) = 140.

How not to format the input


How it should be encoded

Each country deserves its own input signal so that they contribute equally to activating the neurons. enter image description here

Upvotes: 28

Related Questions