Reputation: 343
I am wondering if there is a way that will make the sparkSQL
table in sqlContext
directly visible by other processes, for example Tableau.
I did some research on thrift server, but I didn't find any specific explanation about it. Is it a middleware between Hive(database) and application(client)? If so, do I need to write into a Hive table in my spark program?
When I use Beeline to check the tables from thrift server, there's a field isTempTable
. Could I know what does it mean? I'm guessing it is a temp table in the sqlContext
of thrift server, because I read something about it is a spark driver program and all cached tables are visible through multiple programs. My confusion here is, if it is a driver program, where are the workers?
To summarize,
dataFrame.write.mode(SaveMode.Append).saveAsTable()
)?Thanks
Upvotes: 8
Views: 1066
Reputation: 1665
I assume you've moved on by now, but for anyone who comes across this answer, the Thrift server is effectively a broker between a JDBC connection and SparkSQL.
Once you've got Thrift running (see the Spark docs for a basic intro), you connect over JDBC using the Hive JDBC drivers to Thrift, and it in turn relays your SQL queries to Spark using a HiveContext.
If you have a full Hive metastore up and running, you should be able to see the Hive tables in your JDBC client immediately, otherwise you can create tables on demand by running commands like this in your JDBC client:
CREATE TABLE data1 USING org.apache.spark.sql.parquet OPTIONS (path "/path/to/parquetfile");
CREATE TABLE data2 USING org.apache.spark.sql.json OPTIONS (path "/path/to/jsonfile");
Hope this helps a little.
Upvotes: 8