Reputation: 91
I have a question about running word2vec of Spark MLlib. I run it with vocabulary size ~2.4M and corpus size ~1.4B. What is the reason to get +-infinity vectors for some words? It happens when I increase the number of iterations, namely, with 10 iteration I get a reasonable model, and with 20 iteration I get some vectors of the form [Infinity,-Infinity,Infinity,-Infinity,...]. Thanks in advance.
Upvotes: 9
Views: 380
Reputation: 481
you can do like this for each vector elements:
def input_data(data_input:Double):Double = {
var result = data_input
if (data_input.isInfinity || data_input.isNaN){
result =0
}
result
}
Upvotes: -2