Reputation: 162
I want to connect MongoDB Atlas with PySpark inside Microsoft Fabric Notebook. Here is my pyspark code base.
from pyspark.sql import SparkSession
mongo_uri = "mongodb+srv://<username>:<password>@cluster1.hju3l.mongodb.net/?retryWrites=true&w=majority&appName=Cluster1"
my_spark = SparkSession \
.builder \
.appName("myApp") \
.config("spark.mongodb.read.connection.uri", mongo_uri) \
.config("spark.mongodb.write.connection.uri", mongo_uri) \
.config("spark.jars.packages", "org.mongodb.spark:mongo-spark-connector_2.12:10.2.0") \
.getOrCreate()
df = spark.read.format("mongodb").option("database", "lead").option("collection",
"users").load()
df.printSchema()
But when i try to run above code it throwing an below error
Py4JJavaError: An error occurred while calling o6558.load.
: org.apache.spark.SparkClassNotFoundException: [DATA_SOURCE_NOT_FOUND] Failed to find the data source: mongodb. Please find packages at `https://spark.apache.org/third-party-projects.html`.
After searching the cause of this issue it showing that the mongo-spark-connector jar file not found but i have upload the jar file in library section of Microsoft fabric(custom library section) and also installed mongoengine inside public library section.
I have also upload same jar file(mongo-spark-connector_2.12-10.2.0.jar) inside notebook spark environment also. Below is the screenshort.
Upvotes: 0
Views: 203
Reputation: 89361
Try from Scala. Sometimes the JVM libraries don't load correctly for PySpark.
For PySpark, you can load the library directly from Maven by configuring your cluster. EG, this is for Snowflake.
%%configure -f
{
"conf": {
"spark.jars.packages": "net.snowflake:spark-snowflake_2.12:2.12.0-spark_3.2"
}
}
You would use something like: 'org.mongodb.spark:mongo-spark-connector_2.13:10.2.0' 'org.mongodb.spark:mongo-spark-connector_2.13:10.2.0'
Upvotes: 0