Reputation: 1178
I use CentOS on Cloudera QuickStart VM. I created a sbt-managed Spark application following the other question How to save DataFrame directly to Hive?.
build.sbt
libraryDependencies += "org.apache.spark" %% "spark-core" % "1.5.2"
libraryDependencies += "org.apache.spark" % "spark-sql_2.10" % "1.5.2"
libraryDependencies += "org.apache.spark" % "spark-mllib_2.10" % "1.5.2"
libraryDependencies += "org.apache.spark" % "spark-streaming_2.10" % "1.5.2"
libraryDependencies += "org.apache.spark" %% "spark-hive" % "1.5.2"
I'd like to use a DataFrame as a Hive table as follows:
recordDF.registerTempTable("mytempTable")
hiveContext.sql("create table productstore as select * from mytempTable");
I noticed that I am getting the error:
The root scratch dir: /tmp/hive should be writable. Current permissions are: rwx------
I followed other questions and set chmod 777
for the /tmp/hive
in HDFS.
It occurred to me know that spark as using local filesystem /tmp/hive.
I did a chmod for local filesystem.
Now I am getting error
org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:file:/user/hive/warehouse/productstore is not a directory or unable to create one)
I'd like to store a DataFrame in HDFS hive warehouse.
Upvotes: 0
Views: 3128
Reputation: 74619
There are two issues in play here.
On CentOS (or other Unix flavours), e.g. Linux or macOS, just do the following:
chmod -R 777 /tmp/hive
(which should have been done writable for anyone on the OS since it is a temporary directory).
Since you work with Cloudera QuickStart VM, you use pre-Spark 2.0 (1.6.3 perhaps?).
The issue is due to no path
specified when persisting a DataFrame to a Hive table.
The directory /user/hive/warehouse
is used by default and to avoid writing to the directory, define the path
option when saving to a Hive table using option
method or save
with the path option.
df.write.option("path", "[path-here]").saveAsTable("tableName")
As of Spark 2.0, the above line will write to a local Hive metastore (using Derby) which is in the current directory as spark-warehouse
as specified by spark.sql.warehouse.dir
Spark property.
To upgrade please define your Spark dependencies using 2.0.2
(not 1.5.2
):
libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.0.2"
(Just one single line is all you really need to have Spark SQL with Hive support).
You can then use hive.metastore.warehouse.dir
or spark.sql.warehouse.dir
to set the Spark warehouse and point to HDFS where the other Hive tables live.
Upvotes: 1