Reputation: 171
I am trying to use Apache Sedona with Python, specifically with PySpark version 3.5.0 and Python 3.11.6. However, I am encountering an issue related to an unresolved dependency during the setup process. The relevant part of the error message is as follows:
:::: WARNINGS
module not found: edu.ucar#cdm-core;5.4.2
==== local-m2-cache: tried
file:/.m2/repository/edu/ucar/cdm-core/5.4.2/cdm-core-5.4.2.pom
-- artifact edu.ucar#cdm-core;5.4.2!cdm-core.jar:
file:/.m2/repository/edu/ucar/cdm-core/5.4.2/cdm-core-5.4.2.jar
==== local-ivy-cache: tried
/.ivy2/local/edu.ucar/cdm-core/5.4.2/ivys/ivy.xml
-- artifact edu.ucar#cdm-core;5.4.2!cdm-core.jar:
/.ivy2/local/edu.ucar/cdm-core/5.4.2/jars/cdm-core.jar
==== central: tried
https://repo1.maven.org/maven2/edu/ucar/cdm-core/5.4.2/cdm-core-5.4.2.pom
-- artifact edu.ucar#cdm-core;5.4.2!cdm-core.jar:
https://repo1.maven.org/maven2/edu/ucar/cdm-core/5.4.2/cdm-core-5.4.2.jar
==== spark-packages: tried
https://repos.spark-packages.org/edu/ucar/cdm-core/5.4.2/cdm-core-5.4.2.pom
-- artifact edu.ucar#cdm-core;5.4.2!cdm-core.jar:
https://repos.spark-packages.org/edu/ucar/cdm-core/5.4.2/cdm-core-5.4.2.jar
::::::::::::::::::::::::::::::::::::::::::::::
:: UNRESOLVED DEPENDENCIES ::
::::::::::::::::::::::::::::::::::::::::::::::
:: edu.ucar#cdm-core;5.4.2: not found
::::::::::::::::::::::::::::::::::::::::::::::
The code I am using is as follows:
from pyspark.sql import SparkSession
from pyspark import StorageLevel
import geopandas as gpd
import pandas as pd
from shapely.geometry import Point
from shapely.geometry import Polygon
from sedona.spark import *
from sedona.core.geom.envelope import Envelope
config = SedonaContext.builder() .\
config('spark.jars.packages',
'org.apache.sedona:sedona-spark-shaded-3.4_2.12:1.5.1,'
'org.datasyslab:geotools-wrapper:1.5.1-28.2'). \
getOrCreate()
sedona = SedonaContext.create(config)
sc = sedona.sparkContext
print("Sedona context is " + sc)
I followed the official documentation, but it seems that there is an unresolved dependency issue, possibly related to missing packages or configurations. The official documentation does not provide an exhaustive list of the required dependencies for successful setup. Can you help clarify what additional configurations or packages might be needed to resolve this issue and set up Apache Sedona with PySpark 3.5.0 successfully?
Upvotes: 3
Views: 555
Reputation: 304
Due to a bug in Sedona 1.5.1, the cdm-core jar becomes a required dependency.
You can easily fix this by using this jar org.apache.sedona:sedona-spark-3.4_2.12:1.5.1
. Note that we are not using the shaded
jar.
unshaded
jar is only supposed to be used by a Maven Resolver (e.g., spark.jar.packages is a maven resolver) because it has lots of compile dependencies to be downloaded by the resolver automatically. Those dependencies are not packaged in the sedona unshaded
jar.
shaded
jar is only supposed to be used when you don't have a resolver (e.g., in an environment that has no internet access)
Upvotes: 5
Reputation: 2468
You can try specifying the repository from which the jar files can be downloaded. Here the repositories config is used to specify maven rep
The cdm-core can be found below
https://mvnrepository.com/artifact/edu.ucar/cdm-core/5.4.2
from sedona.spark import *
packages = "org.apache.sedona:sedona-spark-shaded-3.4_2.12:1.5.1," \
"org.datasyslab:geotools-wrapper:1.5.1-28.2," \
"edu.ucar:cdm-core:5.4.2,"
repository = "https://repo1.maven.org/maven2"
config = SedonaContext.builder()\
.config("spark.jars.packages", packages) \
.config("spark.jars.repositories", repository) \
.getOrCreate()
sedona = SedonaContext.create(config)
sc = sedona.sparkContext
print("Sedona context is ", sc)
Upvotes: 0