dulevw
dulevw

Reputation: 85

Which WEKA Classifier for probability?

I have the following problem: I have a dataset (arff), there are stored: character, key holdtime, user. So with this information, I have to calculate the probability for one person who is typing on keyboard.

If a person is typing on keyboard, same information as above will be extracted (user, key holdtime, user) and will be "compared" with the arff file. The result should be as follow: I have a dataset for user "John" in the arff-file. After that, one user types his username "John" and writes a text. The result should be the probability that the user "Johns" typing is equivalent with the dataset of "John" stored in the arff. To 90% it is the right person, it is to 90% John.

I hope, I could explain my problem. My question is, which classifier should I take in this case? I did it with IBK, but if I have 15 persons, probability will be divided through 15 and I get small probabilities. Probability depends on the number of stored persons in arff. Or should I multiply the result with the number of persons to get the real probability?

Upvotes: 1

Views: 174

Answers (1)

AlbertoD
AlbertoD

Reputation: 146

Note: the sum of all the probabilities of a distribution has to be 1.

It is somehow true that you get "small probabilities" when you have more classes, but it's NOT because it is divided by the number of classes, so you won't find the probability you want multiplying the result with the number classes: it is not a probability anymore (it could easily become >1).


The probability distribution that you obtained using IBk is different from what you wanted: it tells you which one, between stored users, is more similar to the current user (probability of being John vs being Paul vs being Sarah etc.), indipendently from the name he said.


The output you want is the result of a binary classifier, but you'll need to train a classifier for every user you stored.

The training set of each classifier will be similar to the dataset you already have, but (in the case of John) there will be isJohn instead of user, and this new column will contanin true if user was John and false otherwise.

EDIT

    character, key holdtime,    user
           90,        150ms,    John
           70,        120ms,   Sarah
          100,        110ms,    Paul

will become

character, key holdtime,  isJohn
       90,        150ms,    true
       70,        120ms,   false
      100,        110ms,   false

The output distribution of this classifier is is John vs is not John.

To have the exact output you want, you must train a classifier for each stored user and call the right one depending on the name the current user said.


About which classifier to use, I think there is not a way to know which is the best for your case. I usually try some classifier and choose the best one

Upvotes: 1

Related Questions