Dwipam Katariya
Dwipam Katariya

Reputation: 144

Unable to specify partitioner in spark-mongo connector (Class not found Exception)

I am currently writing Pyspark_2.2.1 standalone applications with MongoDB_2.6 as the database, mongo-spark connector with version 2.2.1. While running spark application with spark-submit, I get below error:

: java.lang.ClassNotFoundException: spark.mongodb.input.partitionerOptions.MongoPaginateBySizePartitioner

I tried to mention while reading the data from MongoDB database. This is how my read looks like:

users = spark.read.format("com.mongodb.spark.sql.DefaultSource") \
        .option("uri" , "mongodb://127.0.0.1/xyz.abc") \
        .option("partitioner", "spark.mongodb.input.partitionerOptions.MongoPaginateBySizePartitioner ") \
        .load()

I have followed following link to specify partitioner:
https://docs.mongodb.com/spark-connector/master/configuration/.
Here even DefaultPartitioner does not work. I get the same error.
Any help would be appreciated. Thanks

Upvotes: 0

Views: 1820

Answers (1)

Dwipam Katariya
Dwipam Katariya

Reputation: 144

Seems like it's the problem while specifying .option("key","value"). I did mentioned this while instantiating SparkSession:

spark = SparkSession \
        .builder \
        .appName("data_pull") \
        .master("local") \
        .config("spark.mongodb.input.partitioner" ,"MongoPaginateBySizePartitioner") \
        .getOrCreate()

It also seems that MongoDefaultPartitioner user $sample aggregation that comes from Mongo3.2

Upvotes: 0

Related Questions