6:[["$","$Le",null,{}],["$","div",null,{"className":"min-h-screen bg-gray-100 p-6","children":[["$","$Lf",null,{}],["$","script",null,{"type":"application/ld+json","dangerouslySetInnerHTML":{"__html":"{\"@context\":\"https://schema.org\",\"@type\":\"QAPage\",\"mainEntity\":{\"@type\":\"Question\",\"name\":\"How to speed up training Word2vec model with spark?\",\"text\":\"

I am Using spark Word2vec API to build word vector. The code:

\\n\\n

val w2v = new Word2Vec()\\n            .setInputCol(\\\"words\\\")\\n            .setOutputCol(\\\"features\\\")\\n            .setMinCount(5)\\n

\\n\\n

But, this process is so slow. I check spark monitor web, there was two jobs to run long time. \\n $\\\"enter$

\\n\\n

My computer environment have 24 cores CPU and 100G memory, how to use them efficiently?

\\n\",\"author\":{\"@type\":\"Person\",\"name\":\"Ivan Lee\"},\"upvoteCount\":0,\"answerCount\":1,\"acceptedAnswer\":null}}"}}],["$","div",null,{"className":"bg-white shadow-md rounded-lg p-6 mb-6 relative","children":[["$","div",null,{"className":"absolute top-4 right-4 flex flex-wrap space-x-2","children":[["$","span","apache-spark",{"className":"bg-blue-600 text-white text-sm px-3 py-1 rounded-full","children":["$","$L10",null,{"href":"/discussion/tag/apache-spark/1","children":"apache-spark"}]}],["$","span","apache-spark-mllib",{"className":"bg-blue-600 text-white text-sm px-3 py-1 rounded-full","children":["$","$L10",null,{"href":"/discussion/tag/apache-spark-mllib/1","children":"apache-spark-mllib"}]}]]}],["$","div",null,{"className":"flex items-center mb-4","children":[["$","img",null,{"src":"https://www.gravatar.com/avatar/c019fa0c00350ce2ad773c9f2f398b71?s=256&d=identicon&r=PG&f=y&so-version=2","alt":"Ivan Lee","className":"w-16 h-16 rounded-full border"}],["$","div",null,{"className":"ml-4","children":[["$","a",null,{"href":"https://stackoverflow.com/users/2281101/ivan-lee","target":"_blank","rel":"noopener noreferrer","className":"text-lg font-semibold text-blue-600 hover:underline","children":"Ivan Lee"}],["$","p",null,{"className":"text-sm text-gray-500","children":["Reputation: ",4261]}]]}]]}],["$","h1",null,{"className":"text-2xl font-bold text-gray-800 mb-4","children":"How to speed up training Word2vec model with spark?"}],["$","p",null,{"className":"text-gray-700 mt-4","dangerouslySetInnerHTML":{"__html":"

I am Using spark Word2vec API to build word vector. The code:

\n\n

val w2v = new Word2Vec()\n            .setInputCol(\"words\")\n            .setOutputCol(\"features\")\n            .setMinCount(5)\n

\n\n

But, this process is so slow. I check spark monitor web, there was two jobs to run long time. \n $\"enter$

\n\n

My computer environment have 24 cores CPU and 100G memory, how to use them efficiently?

\n"}}],["$","div",null,{"className":"text-gray-600 text-sm mt-4","children":[["$","p",null,{"children":["Upvotes: ",0]}],["$","p",null,{"children":["Views: ",626]}]]}]]}],["$","div",null,{"className":"container mx-auto","children":[["$","h2",null,{"className":"text-2xl font-semibold text-gray-800 mb-6","children":["Answers (",1,")"]}],[["$","div","57462183",{"className":"bg-white shadow-md rounded-lg p-6 mb-6","children":[["$","div",null,{"className":"flex items-center mb-4","children":[["$","img",null,{"src":"https://www.gravatar.com/avatar/19694484bb28e2e08a146abd86edb640?s=256&d=identicon&r=PG&f=y&so-version=2","alt":"Mi7flat5","className":"w-12 h-12 rounded-full border"}],["$","div",null,{"className":"ml-4","children":[["$","a",null,{"href":"https://stackoverflow.com/users/5022965/mi7flat5","target":"_blank","rel":"noopener noreferrer","className":"text-lg font-semibold text-blue-600 hover:underline","children":"Mi7flat5"}],["$","p",null,{"className":"text-sm text-gray-500","children":["Reputation: ",89]}]]}]]}],["$","p",null,{"className":"text-gray-700 mb-4","dangerouslySetInnerHTML":{"__html":"

I would try increasing the amount of partitions in the dataframe that you are doing the feature extraction on. the stragglers are likely due to skew in the data causing most of the data to be processed by one node or core. If possible, distribute the data by logical partitioning, if not then create a random even distribution.

\n"}}],["$","div",null,{"className":"text-gray-600 text-sm","children":["$","p",null,{"children":["Upvotes: ",1]}]}]]}]]]}],["$","div",null,{"className":"bg-white shadow-md rounded-lg p-6 mt-6","children":[["$","h2",null,{"className":"text-2xl font-semibold text-gray-800 mb-4","children":"Related Questions"}],["$","ul",null,{"className":"list-disc list-inside","children":[["$","li","34377742",{"className":"mb-2","children":["$","$L10",null,{"href":"/discussion/solution/34377742","className":"text-blue-600 hover:underline","children":"How to train word2vec model efficiently in the spark cluster environment?"}]}],["$","li","43321492",{"className":"mb-2","children":["$","$L10",null,{"href":"/discussion/solution/43321492","className":"text-blue-600 hover:underline","children":"Word2Vec: Any way to train model fastly?"}]}],["$","li","59919462",{"className":"mb-2","children":["$","$L10",null,{"href":"/discussion/solution/59919462","className":"text-blue-600 hover:underline","children":"Speed Up Gensim's Word2vec for a Massive Dataset"}]}],["$","li","48086226",{"className":"mb-2","children":["$","$L10",null,{"href":"/discussion/solution/48086226","className":"text-blue-600 hover:underline","children":"Spark MLib Word2Vec Error: The vocabulary size should be > 0"}]}],["$","li","48625984",{"className":"mb-2","children":["$","$L10",null,{"href":"/discussion/solution/48625984","className":"text-blue-600 hover:underline","children":"Word2Vec on Spark Scala"}]}],["$","li","48153957",{"className":"mb-2","children":["$","$L10",null,{"href":"/discussion/solution/48153957","className":"text-blue-600 hover:underline","children":"Spark MLin Word2vec"}]}],["$","li","45841895",{"className":"mb-2","children":["$","$L10",null,{"href":"/discussion/solution/45841895","className":"text-blue-600 hover:underline","children":"Spark standalone : SparklyR : Performance issues"}]}],["$","li","37171911",{"className":"mb-2","children":["$","$L10",null,{"href":"/discussion/solution/37171911","className":"text-blue-600 hover:underline","children":"Training Sparks word2vec with a RDD[String]"}]}],["$","li","43685511",{"className":"mb-2","children":["$","$L10",null,{"href":"/discussion/solution/43685511","className":"text-blue-600 hover:underline","children":"Word2Vec : Apache Spark and Tensorflow implementations"}]}],["$","li","39755358",{"className":"mb-2","children":["$","$L10",null,{"href":"/discussion/solution/39755358","className":"text-blue-600 hover:underline","children":"Distributed Word2Vec Model Training using Apache Spark 2.0.0 and mllib"}]}]]}]]}]]}],["$","$L11",null,{}],["$","$L12",null,{}],["$","$L13",null,{}],["$","$L14",null,{}],["$","$L15",null,{}]]

How to speed up training Word2vec model with spark?

Answers (1)

Related Questions