Qiao Robin
Qiao Robin

Reputation: 21

Spark-sql CLI use only 1 executor when running query

I am going to use spark-sql cli to replace the hive cli shell, and I run the spark-sql cli with following the command,(We are using on yarn Hadoop cluster, the hive-site.xml already copied to /conf)

.> spark-sql Then the shell is opened and works ok,

And I execute a query something like,

./spark-sql>select devicetype, count(*) from mytable group by devicetype;

The command execute successfully and the result is correct. But I notice the performance is very slow.

From the spark job ui, http://myhost:4040, I noticed that only 1 Executor marked used, so that’s maybe the reason.

And I try to modify the spark-sql script and add the –num-executors 500 in the exec command, but it does not help.

So anyone could help and explain why?

Thanks.

Upvotes: 2

Views: 4900

Answers (2)

linehrr
linehrr

Reputation: 1748

beeline \> !connect jdbc:hive2://localhost:10002/default;transportMode=http;httpPath=cliservice

10002 is my port for the spark thrift server.

change it to yours. you can find your thrift port from your thrift log.

Upvotes: 0

0x0FFF
0x0FFF

Reputation: 5018

Refer to the documentation: http://spark.apache.org/docs/latest/sql-programming-guide.html

spark-sql is an SQL CLI tool that works only in local mode, that is why you see only one executor

If you want to have a cluster version of SQL, you should start thriftserver and connect to it via JDBC using beeline tool (that goes with Spark), for example. You can find the description in chapter Running the Thrift JDBC/ODBC server of the official documentation http://spark.apache.org/docs/latest/sql-programming-guide.html

To start:

export HIVE_SERVER2_THRIFT_PORT=<listening-port>
export HIVE_SERVER2_THRIFT_BIND_HOST=<listening-host>
./sbin/start-thriftserver.sh \
  --master <master-uri> \
  ...

To connect:

./bin/beeline
beeline> !connect jdbc:hive2://<listening-host>:<listening-port>

Upvotes: 1

Related Questions