user3693309
user3693309

Reputation: 343

SparkSQL, Thrift Server and Tableau

I am wondering if there is a way that will make the sparkSQL table in sqlContext directly visible by other processes, for example Tableau.

I did some research on thrift server, but I didn't find any specific explanation about it. Is it a middleware between Hive(database) and application(client)? If so, do I need to write into a Hive table in my spark program?

When I use Beeline to check the tables from thrift server, there's a field isTempTable. Could I know what does it mean? I'm guessing it is a temp table in the sqlContext of thrift server, because I read something about it is a spark driver program and all cached tables are visible through multiple programs. My confusion here is, if it is a driver program, where are the workers?

To summarize,

  1. Where should I write my DataFrame, or tables in sqlContext to? Which method should I use(like dataFrame.write.mode(SaveMode.Append).saveAsTable())?
  2. Should the default settings be used for the thrift server? Or are the changes necessary?

Thanks

Upvotes: 8

Views: 1066

Answers (1)

Ewan Leith
Ewan Leith

Reputation: 1665

I assume you've moved on by now, but for anyone who comes across this answer, the Thrift server is effectively a broker between a JDBC connection and SparkSQL.

Once you've got Thrift running (see the Spark docs for a basic intro), you connect over JDBC using the Hive JDBC drivers to Thrift, and it in turn relays your SQL queries to Spark using a HiveContext.

If you have a full Hive metastore up and running, you should be able to see the Hive tables in your JDBC client immediately, otherwise you can create tables on demand by running commands like this in your JDBC client:

CREATE TABLE data1 USING org.apache.spark.sql.parquet OPTIONS (path "/path/to/parquetfile");
CREATE TABLE data2 USING org.apache.spark.sql.json OPTIONS (path "/path/to/jsonfile");

Hope this helps a little.

Upvotes: 8

Related Questions