Le Kim Trang
Le Kim Trang

Reputation: 459

Engine hang when training large data

I encounter problem with als, similar product template, when training 400 000 records (around 200 MB). It stop at Stage 13 : ===================== [0 + 1] / 2

Can anybody help me with this?

Upvotes: 0

Views: 224

Answers (1)

bodyjares
bodyjares

Reputation: 440

You are using the mini Spark server when you launch the command:

pio train

This mini server is limited in resources. You need to launch your own Spark cluster. It can be started on the same machine as your PredictionIO server. You need to start Spark standalone with these commands:

./PredictionIO/vendors/spark-1.5.1/sbin/start-master.sh --webui-port 8180
./PredictionIO/vendors/spark-1.5.1/sbin/start-slave.sh spark://localhost:7077 --webui-port 8181

Then you can train using that Spark instance with this command:

pio train -- --master spark://localhost:7077 --driver-memory 4G --executor-memory 8G

If spark://localhost:7077 is not accessible you can open the Webui with port 8180 to see the URL of the master (first line of the page). Use that URL to connect your slave and your train.

Upvotes: 0

Related Questions