Reputation: 75
The calculation for class conditional probability in naive bayes is
P(t|c) = Log2((n1+1)/(n2+n3))
Where
Which one is faster, doing calculation in MySQL or in Java (of course we need to grab data from MySQL to use it in Java)?
Upvotes: 0
Views: 307
Reputation: 1271151
The Naive Bayes classifier is computationally simple, but it requires lots of data manipulations. When applied to text, you are generally looking for a lot of different terms inside the text.
I have a natural bias toward doing these types of calculations in SQL. I would at least argue that MySQL is a reasonable environment for doing this. Depending on the exact nature of the problem and the structure of your data, you might find that full text indexing is helpful. I would be wary about working with a large corpus (many tens or hundreds of gigabytes) on the application side. My book "Data Analysis Using SQL and Excel" has a chapter devoted to Naive Bayes and similar types of models.
Upvotes: 1