Reputation: 2703
I'm trying to run LogisticRegressionWithLBFGS from Mllib and I get many Hive issues:
py4j.protocol.Py4JJavaError: An error occurred while calling o337.trainLogisticRegressionModelWithLBFGS.
: org.apache.spark.sql.AnalysisException: java.lang.RuntimeException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient;
The fact is I didn't even install Hive... But why does this function rely on Hive? It is written nowhere in the documentation... Is it a prerequisite to install Hive to run any Mllib function?
Upvotes: 0
Views: 471
Reputation: 191904
A Hive installation is not needed, but Spark needs Hive-compatible classes to operate on DataFrame objects, such as those within an ML pipeline step.
The pip install pyspark
, for example, doesn't come with these (or any Hadoop) libraries, as far as I know.
If you downloaded Spark with Hadoop from the Apache site, then you will get Hive libraries and a bin/pyspark
script. On windows, though, you might need to setup WinUtils.
Upvotes: 1