Reputation: 144
I am currently writing Pyspark_2.2.1 standalone applications with MongoDB_2.6 as the database, mongo-spark connector with version 2.2.1. While running spark application with spark-submit, I get below error:
: java.lang.ClassNotFoundException: spark.mongodb.input.partitionerOptions.MongoPaginateBySizePartitioner
I tried to mention while reading the data from MongoDB database. This is how my read looks like:
users = spark.read.format("com.mongodb.spark.sql.DefaultSource") \
.option("uri" , "mongodb://127.0.0.1/xyz.abc") \
.option("partitioner", "spark.mongodb.input.partitionerOptions.MongoPaginateBySizePartitioner ") \
.load()
I have followed following link to specify partitioner:
https://docs.mongodb.com/spark-connector/master/configuration/.
Here even DefaultPartitioner does not work. I get the same error.
Any help would be appreciated. Thanks
Upvotes: 0
Views: 1820
Reputation: 144
Seems like it's the problem while specifying .option("key","value"). I did mentioned this while instantiating SparkSession:
spark = SparkSession \
.builder \
.appName("data_pull") \
.master("local") \
.config("spark.mongodb.input.partitioner" ,"MongoPaginateBySizePartitioner") \
.getOrCreate()
It also seems that MongoDefaultPartitioner user $sample aggregation that comes from Mongo3.2
Upvotes: 0