Reputation: 384
I have doubt with Naive bayes with numeric and non numeric features . like I have 5 independent independent parameter on these i want to classify data .
Male,Suspicion of Alcohol,Weekday,12am-4am,75,30-39 Male,Moving Traffic Violation,Weekday,12am-4am,0,20-24 Male,Suspicion of Alcohol,Weekend,4am-8am,12,40-49 Male,Suspicion of Alcohol,Weekday,12am-4am,0,50-59 Female,Road Traffic Collision,Weekend,12pm-4pm,0,20-24 Male,Road Traffic Collision,Weekday,12pm-4pm,0,25-29 Male,Road Traffic Collision,Weekday,8pm-12pm,0,Other Male,Other,Weekday,8am-12pm,23,60-69 Male,Moving Traffic Violation,Weekend,12pm-4pm,26,30-39 Female,Road Traffic Collision,Weekend,4am-8am,61,16-19 Male,Moving Traffic Violation,Weekend,4pm-8pm,74,25-29 Male,Road Traffic Collision,Weekday,12am-4am,0,Other Male,Moving Traffic Violation,Weekday,8pm-12pm,0,16-19 Male,Road Traffic Collision,Weekday,8pm-12pm,0,Other Male,Moving Traffic Violation,Weekend,4am-8am,0,30-39
You can see some parameters are numeric some are non numeric . Any know how to convert non numeric data to numeric data .
Upvotes: 1
Views: 76
Reputation: 63072
You can start with the following:
convert each of the features to a categorical value by applying a factorizer
. An example:
Feature1: Male = 0 Female = 1
and so on.
Each different possible value of one "column" should have its own specific numerical representation in your factorized result. Hopefully things like 4pm-8pm
are non-overlapping: but if they are you can start with ignoring that detail and later do some more intelligent manual featurization if time allows.
Each entry/line in your input consists of around a dozen "features". Then you can create a feature vector out of each line. The results are now tf-idf ready
(TM). You can apply the NB algorithm to your newly minted feature vectors - and find relative similarities.
Upvotes: 0