femibyte
femibyte

Reputation: 3507

Spark SQL CLI vs Thriftserver/Beeline

Can someone spell out the differences between using the Spark SQL CLI vs. Thriftserver/Beeline to query/modify data in Hive ? The Spark SQL documentation mentions both of them but when would you use one or the other or are they equivalent alternatives from a functional point of view ?

Upvotes: 2

Views: 1262

Answers (1)

Kristian
Kristian

Reputation: 21830

For clarification:

  • spark-sql is a program that runs a single instance of Spark and you interact with it as if it were a mysql-like shell prompt and it makes use of the spark-warehouse and those types of features

  • Spark with Thriftserver is an application that exposes a connection to a running instance of Spark over a JDBC connection. https://community.hortonworks.com/questions/33715/why-do-we-need-to-setup-spark-thrift-server.html

  • Beeline is a query / consumer tool that one uses to consume / connect to a running JDBC hive2 table (and thus in the spark documentation, they use beeline to test that the JDBC connection is in fact working). Note: query / connector programs like SQL Workbench can be made to connect to Spark with Thriftserver if it imports the proper Hive2 JDBC drivers & jars

Upvotes: 2

Related Questions