ritam mukherjee
ritam mukherjee

Reputation: 31

Why does MongoDB Spark Connector fail with AbstractMethodError?

I am trying to insert a spark sql dataframe in a remote mongodb collection. Previously I wrote a java program with MongoClient to check whether the remote collection is accessible and I was successfully able to do so.

My present spark code is as below -

scala> val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc)
warning: there was one deprecation warning; re-run with -deprecation for details
sqlContext: org.apache.spark.sql.hive.HiveContext = org.apache.spark.sql.hive.HiveContext@1a8b22b5
scala> val depts = sqlContext.sql("select * from test.user_details")
depts: org.apache.spark.sql.DataFrame = [user_id: string, profile_name: string ... 7 more fields]
scala> depts.write.options(scala.collection.Map("uri" -> "mongodb://<username>:<pwd>@<hostname>:27017/<dbname>.<collection>")).mode(SaveMode.Overwrite).format("com.mongodb.spark.sql").save()

Ths is giving the following error -

java.lang.AbstractMethodError: com.mongodb.spark.sql.DefaultSource.createRelation(Lorg/apache/spark/sql/SQLContext;Lorg/apache/spark/sql/SaveMode;Lscala/collection/immutable/Map;Lorg/apache/spark/sql/Dataset;)Lorg/apache/spark/sql/sources/BaseRelation;
  at org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:429)
  at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:211)
  ... 84 elided

I also tried the following which is throwing the below error :

scala> depts.write.options(scala.collection.Map("uri" -> "mongodb://<username>:<pwd>@<host>:27017/<database>.<collection>")).mode(SaveMode.Overwrite).save()
java.lang.IllegalArgumentException: 'path' is not specified
  at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$17.apply(DataSource.scala:438)
  at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$17.apply(DataSource.scala:438)
  at scala.collection.MapLike$class.getOrElse(MapLike.scala:128)
  at org.apache.spark.sql.execution.datasources.CaseInsensitiveMap.getOrElse(ddl.scala:117)
  at org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:437)
  at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:211)
  ... 58 elided

I have imported the following packages -

import org.apache.spark.{SparkConf, SparkContext}

import org.apache.spark.sql.SQLContext

import com.mongodb.casbah.{WriteConcern => MongodbWriteConcern}

import com.mongodb.spark.config._

import org.apache.spark.sql.hive.HiveContext

import org.apache.spark.sql._

depts.show() is working as expected, ie. dataframe is successfully Created.

Please can someone provide me any advice/suggestion on this. Thanks

Upvotes: 3

Views: 1876

Answers (2)

Jacek Laskowski
Jacek Laskowski

Reputation: 74759

Have a look at this error and think of possible ways to face it. That is due to a Spark version mismatch between the Spark Connector for MongoDB and Spark you use.

java.lang.AbstractMethodError: com.mongodb.spark.sql.DefaultSource.createRelation(Lorg/apache/spark/sql/SQLContext;Lorg/apache/spark/sql/SaveMode;Lscala/collection/immutable/Map;Lorg/apache/spark/sql/Dataset;)Lorg/apache/spark/sql/sources/BaseRelation;

Quoting the javadoc of java.lang.AbstractMethodError:

Thrown when an application tries to call an abstract method. Normally, this error is caught by the compiler; this error can only occur at run time if the definition of some class has incompatibly changed since the currently executing method was last compiled.

That pretty much explains what you experience (note the part that starts with "this error can only occur at run time").

My guess is that the part Lorg/apache/spark/sql/Dataset in the DefaultSource.createRelation method in the stack trace is exactly the culprit.

In other words, that line uses data: DataFrame not Dataset which are incompatible in this direction, i.e. DataFrame is simply a Scala type alias of Dataset[Row], but any Dataset is not a DataFrame and hence the runtime error.

override def createRelation(sqlContext: SQLContext, mode: SaveMode, parameters: Map[String, String], data: DataFrame): BaseRelation

Upvotes: 0

Wan B.
Wan B.

Reputation: 18845

Assuming that you are using MongoDB Spark Connector v1.0, You can save DataFrames SQL like below:

// DataFrames SQL example 
df.registerTempTable("temporary")
val depts = sqlContext.sql("select * from test.user_details")
depts.show()
// Save out the filtered DataFrame result
MongoSpark.save(depts.write.option("uri", "mongodb://hostname:27017/database.collection").mode("overwrite"))

For more information see MongoDB Spark Connector: Spark SQL

For a simple demo of MongoDB and Spark using docker see MongoDB Spark Docker: examples.scala - dataframes

Upvotes: 1

Related Questions