dipti
dipti

Reputation: 162

Connect MongoDB Atlas with Pyspark Inside Microsoft Fabric Notebook

I want to connect MongoDB Atlas with PySpark inside Microsoft Fabric Notebook. Here is my pyspark code base.

from pyspark.sql import SparkSession
mongo_uri = "mongodb+srv://<username>:<password>@cluster1.hju3l.mongodb.net/?retryWrites=true&w=majority&appName=Cluster1"
my_spark = SparkSession \
    .builder \
    .appName("myApp") \
    .config("spark.mongodb.read.connection.uri", mongo_uri) \
    .config("spark.mongodb.write.connection.uri", mongo_uri) \
    .config("spark.jars.packages", "org.mongodb.spark:mongo-spark-connector_2.12:10.2.0") \
    .getOrCreate()
df = spark.read.format("mongodb").option("database", "lead").option("collection", 
"users").load()
df.printSchema()

But when i try to run above code it throwing an below error

Py4JJavaError: An error occurred while calling o6558.load.
: org.apache.spark.SparkClassNotFoundException: [DATA_SOURCE_NOT_FOUND] Failed to find the data source: mongodb. Please find packages at `https://spark.apache.org/third-party-projects.html`.

After searching the cause of this issue it showing that the mongo-spark-connector jar file not found but i have upload the jar file in library section of Microsoft fabric(custom library section) and also installed mongoengine inside public library section.

I have also upload same jar file(mongo-spark-connector_2.12-10.2.0.jar) inside notebook spark environment also. Below is the screenshort.

enter image description here

enter image description here

Upvotes: 0

Views: 203

Answers (1)

David Browne - Microsoft
David Browne - Microsoft

Reputation: 89361

Try from Scala. Sometimes the JVM libraries don't load correctly for PySpark.

For PySpark, you can load the library directly from Maven by configuring your cluster. EG, this is for Snowflake.

%%configure -f
{
  "conf": {
    "spark.jars.packages": "net.snowflake:spark-snowflake_2.12:2.12.0-spark_3.2"
  }
}

You would use something like: 'org.mongodb.spark:mongo-spark-connector_2.13:10.2.0' 'org.mongodb.spark:mongo-spark-connector_2.13:10.2.0'

Upvotes: 0

Related Questions