Reputation: 4261
I am Using spark Word2vec API to build word vector. The code:
val w2v = new Word2Vec()
.setInputCol("words")
.setOutputCol("features")
.setMinCount(5)
But, this process is so slow. I check spark monitor web, there was two jobs to run long time.
My computer environment have 24 cores CPU and 100G memory, how to use them efficiently?
Upvotes: 0
Views: 626
Reputation: 89
I would try increasing the amount of partitions in the dataframe that you are doing the feature extraction on. the stragglers are likely due to skew in the data causing most of the data to be processed by one node or core. If possible, distribute the data by logical partitioning, if not then create a random even distribution.
Upvotes: 1