Reputation: 31
I am trying to insert a spark sql dataframe in a remote mongodb collection. Previously I wrote a java program with MongoClient to check whether the remote collection is accessible and I was successfully able to do so.
My present spark code is as below -
scala> val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc)
warning: there was one deprecation warning; re-run with -deprecation for details
sqlContext: org.apache.spark.sql.hive.HiveContext = org.apache.spark.sql.hive.HiveContext@1a8b22b5
scala> val depts = sqlContext.sql("select * from test.user_details")
depts: org.apache.spark.sql.DataFrame = [user_id: string, profile_name: string ... 7 more fields]
scala> depts.write.options(scala.collection.Map("uri" -> "mongodb://<username>:<pwd>@<hostname>:27017/<dbname>.<collection>")).mode(SaveMode.Overwrite).format("com.mongodb.spark.sql").save()
Ths is giving the following error -
java.lang.AbstractMethodError: com.mongodb.spark.sql.DefaultSource.createRelation(Lorg/apache/spark/sql/SQLContext;Lorg/apache/spark/sql/SaveMode;Lscala/collection/immutable/Map;Lorg/apache/spark/sql/Dataset;)Lorg/apache/spark/sql/sources/BaseRelation;
at org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:429)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:211)
... 84 elided
I also tried the following which is throwing the below error :
scala> depts.write.options(scala.collection.Map("uri" -> "mongodb://<username>:<pwd>@<host>:27017/<database>.<collection>")).mode(SaveMode.Overwrite).save()
java.lang.IllegalArgumentException: 'path' is not specified
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$17.apply(DataSource.scala:438)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$17.apply(DataSource.scala:438)
at scala.collection.MapLike$class.getOrElse(MapLike.scala:128)
at org.apache.spark.sql.execution.datasources.CaseInsensitiveMap.getOrElse(ddl.scala:117)
at org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:437)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:211)
... 58 elided
I have imported the following packages -
import org.apache.spark.{SparkConf, SparkContext}
import org.apache.spark.sql.SQLContext
import com.mongodb.casbah.{WriteConcern => MongodbWriteConcern}
import com.mongodb.spark.config._
import org.apache.spark.sql.hive.HiveContext
import org.apache.spark.sql._
depts.show() is working as expected, ie. dataframe is successfully Created.
Please can someone provide me any advice/suggestion on this. Thanks
Upvotes: 3
Views: 1876
Reputation: 74759
Have a look at this error and think of possible ways to face it. That is due to a Spark version mismatch between the Spark Connector for MongoDB and Spark you use.
java.lang.AbstractMethodError: com.mongodb.spark.sql.DefaultSource.createRelation(Lorg/apache/spark/sql/SQLContext;Lorg/apache/spark/sql/SaveMode;Lscala/collection/immutable/Map;Lorg/apache/spark/sql/Dataset;)Lorg/apache/spark/sql/sources/BaseRelation;
Quoting the javadoc of java.lang.AbstractMethodError:
Thrown when an application tries to call an abstract method. Normally, this error is caught by the compiler; this error can only occur at run time if the definition of some class has incompatibly changed since the currently executing method was last compiled.
That pretty much explains what you experience (note the part that starts with "this error can only occur at run time").
My guess is that the part Lorg/apache/spark/sql/Dataset
in the DefaultSource.createRelation
method in the stack trace is exactly the culprit.
In other words, that line uses data: DataFrame
not Dataset
which are incompatible in this direction, i.e. DataFrame
is simply a Scala type alias of Dataset[Row]
, but any Dataset is not a DataFrame
and hence the runtime error.
override def createRelation(sqlContext: SQLContext, mode: SaveMode, parameters: Map[String, String], data: DataFrame): BaseRelation
Upvotes: 0
Reputation: 18845
Assuming that you are using MongoDB Spark Connector v1.0, You can save DataFrames SQL like below:
// DataFrames SQL example
df.registerTempTable("temporary")
val depts = sqlContext.sql("select * from test.user_details")
depts.show()
// Save out the filtered DataFrame result
MongoSpark.save(depts.write.option("uri", "mongodb://hostname:27017/database.collection").mode("overwrite"))
For more information see MongoDB Spark Connector: Spark SQL
For a simple demo of MongoDB and Spark using docker see MongoDB Spark Docker: examples.scala - dataframes
Upvotes: 1