Spark Naive Bayes model persistence : understanding pi & theta

Question

I am working on Naive Bayes based implementation and I am using Spark 2.0 for the same, as far as model tuning is concerned I done with it, but I am stuck at persistence of the model, I am well aware of the Model persistence support in Spark 2, but my concerns is with the content of the saved model for naive Bayes particularly in the data folder of saved model, it store value of pi (vector) which is dependent on number of class we have & other is theta (Matrix) which depends up on number of class & number of features set for Naive Bayes, so in sort content of data folder of model depends on actual data and will grow with data size,

Can any one help me with understanding what it stores exactly, I basically need the same to make my decision about where to put these data in my production architecture.

i tried to find a lot on these but don,t understand exactly what they are.. in Spark java docs they are mentioned as

@param pi log of class priors, whose dimension is C (number of classes)
@param theta log of class conditional probabilities, whose dimension is C (number of classes) by D (number of features)

but I am not able to understand what exactly are these value and why they are needed, it will be helpful if anyone help out understanding

Question also relates to the fact that they are added in version 2.0, so prior this in 1.6 it would be working without pi & theta

Spark Naive Bayes model persistence : understanding pi & theta

Answers (1)

Related Questions

Spark Naive Bayes model persistence : understanding pi &amp; theta

Answers (1)

Related Questions

Spark Naive Bayes model persistence : understanding pi & theta