d3r0n
d3r0n

Reputation: 128

Best practices for handling non decimal variables. [ACM KDD 2009 CUP]

For practice I decided to use neural network to solve problem of classification (2 classes) stated by ACM Special Interest Group on Knowledge Discovery and Data Mining at 2009 cup. The problem I have found is that the data set contains a lot of "empty" variables and I am not sure how to handle them. Furthermore second question appears. How to handle with other non decimals like strings. What are Your best practices?

Upvotes: 1

Views: 122

Answers (1)

Qnan
Qnan

Reputation: 3744

Most approaches require numerical features, so the categorical ones have to be converted into counts. E.g. if a certain string is present among the attributes of an instance, it's count is 1, otherwise 0. If it occurs more than once, it's count increases correspondingly. From this point of view any feature that is not present (or "empty" as you put it) has a count of 0. Note that the attribute names have to be unique.

Upvotes: 1

Related Questions