Reputation: 101
I'm trying to read from google secretmanager. I deploy my scala/spark as Fat JAR on dataproc with SBT.
When I call :
val client: SecretManagerServiceClient = SecretManagerServiceClient.create()
I Have the following error on dataproc :
Using the default container image
Waiting for container log creation
PYSPARK_PYTHON=/opt/dataproc/conda/bin/python
Generating /home/spark/.pip/pip.conf
Configuring index-url as 'https://europe-python.pkg.dev/artifact-registry-python-cache/virtual-python/simple/'
JAVA_HOME=/usr/lib/jvm/temurin-17-jdk-amd64
SPARK_EXTRA_CLASSPATH=
:: loading settings :: file = /etc/spark/conf/ivysettings.xml
24/08/30 13:35:09 WARN MetricsConfig: Cannot locate configuration: tried hadoop-metrics2-google-hadoop-file-system.properties,hadoop-metrics2.properties
24/08/30 13:35:09 INFO MetricsSystemImpl: Scheduled Metric snapshot period at 10 second(s).
24/08/30 13:35:09 INFO MetricsSystemImpl: google-hadoop-file-system metrics system started
Exception in thread "main" java.lang.NoSuchMethodError: 'io.grpc.MethodDescriptor$Marshaller io.grpc.protobuf.ProtoUtils.marshaller(repackaged.com.google.protobuf.Message)'
at com.google.cloud.secretmanager.v1.stub.GrpcSecretManagerServiceStub.<clinit>(GrpcSecretManagerServiceStub.java:72)
at com.google.cloud.secretmanager.v1.stub.SecretManagerServiceStubSettings.createStub(SecretManagerServiceStubSettings.java:350)
at com.google.cloud.secretmanager.v1.SecretManagerServiceClient.<init>(SecretManagerServiceClient.java:455)
at com.google.cloud.secretmanager.v1.SecretManagerServiceClient.create(SecretManagerServiceClient.java:437)
at com.google.cloud.secretmanager.v1.SecretManagerServiceClient.create(SecretManagerServiceClient.java:428)
at fr.mycomp.graphuser.GraphUserApp$.getCredentials(GraphUserApp.scala:30)
at fr.mycomp.graphuser.GraphUserApp$.delayedEndpoint$fr$mycomp$graphuser$GraphUserApp$1(GraphUserApp.scala:95)
at fr.mycomp.graphuser.GraphUserApp$delayedInit$body.apply(GraphUserApp.scala:15)
at scala.Function0.apply$mcV$sp(Function0.scala:39)
at scala.Function0.apply$mcV$sp$(Function0.scala:39)
at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:17)
at scala.App.$anonfun$main$1$adapted(App.scala:80)
at scala.collection.immutable.List.foreach(List.scala:431)
at scala.App.main(App.scala:80)
at scala.App.main$(App.scala:78)
at fr.mycomp.graphuser.GraphUserApp$.main(GraphUserApp.scala:15)
at fr.mycomp.graphuser.GraphUserApp.main(GraphUserApp.scala)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:569)
at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:1032)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:194)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:217)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:91)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1124)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1133)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
ERROR: (gcloud.dataproc.batches.submit.spark) Batch job is FAILED. Detail: Job failed with message [Exception in thread "main" java.lang.NoSuchMethodError: 'io.grpc.MethodDescriptor$Marshaller io.grpc.protobuf.ProtoUtils.marshaller(repackaged.com.google.protobuf.Message)']
Here my build.sbt:
val sparkVersion = settingKey[String]("Spark version")
lazy val root = (project in file("."))
.settings(
inThisBuild(List(
organization := "fr.mycomp",
scalaVersion := "2.12.13"
)),
name := "graphUser",
version := "0.0.1",
sparkVersion := "3.5.0",
javacOptions ++= Seq("-source", "1.8", "-target", "1.8"),
javaOptions ++= Seq("-Xms512M", "-Xmx2048M"),
scalacOptions ++= Seq("-deprecation", "-unchecked"),
parallelExecution in Test := false,
fork := true,
coverageHighlighting := true,
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % sparkVersion.value exclude("com.google.protobuf","protobuf-java"),
"org.apache.spark" %% "spark-sql" % sparkVersion.value exclude("com.google.protobuf","protobuf-java"),
"org.apache.spark" %% "spark-graphx" % sparkVersion.value exclude("com.google.protobuf","protobuf-java"),
// Snowflake Connector for Spark
"net.snowflake" % "spark-snowflake_2.12" % "3.0.0",
"net.snowflake" % "snowflake-jdbc" % "3.17.0",
// https://storage.googleapis.com/cloud-opensource-java-dashboard/com.google.cloud/libraries-bom/26.45.0/index.html
"com.google.cloud" % "google-cloud-storage" % "2.40.1",
"com.google.cloud" % "google-cloud-secretmanager" % "2.48.0",
"com.google.protobuf" % "protobuf-java" % "3.25.4",
"com.google.protobuf" % "protobuf-java-util" % "3.25.4",
"io.grpc" % "grpc-all" % "1.66.0",
"io.grpc" % "grpc-protobuf" % "1.66.0",
"io.grpc" % "grpc-okhttp" % "1.66.0",
"io.grpc" % "grpc-protobuf-lite" % "1.66.0",
"io.grpc" % "grpc-stub" % "1.66.0",
// PureConfig for configuration management
"com.github.pureconfig" %% "pureconfig" % "0.17.6",
// SLF4J logging dependencies
"org.slf4j" % "slf4j-api" % "2.0.9",
"org.slf4j" % "slf4j-log4j12" % "2.0.9",
// Log4J logging dependencies
"org.apache.logging.log4j" % "log4j-api" % "2.23.1",
"org.apache.logging.log4j" % "log4j-core" % "2.23.1",
// JSON4S for JSON parsing
"org.json4s" %% "json4s-native" % "3.6.6",
"org.json4s" %% "json4s-jackson" % "3.6.6",
"org.json4s" %% "json4s-core" % "3.6.6",
"org.json4s" %% "json4s-ast" % "3.6.6",
"org.json4s" %% "json4s-scalap" % "3.6.6",
// Testing dependencies
"org.scalatest" %% "scalatest" % "3.2.19" % Test,
"org.scalacheck" %% "scalacheck" % "1.18.0" % Test,
"com.holdenkarau" %% "spark-testing-base" % s"${sparkVersion.value}_1.5.3" % Test
),
// Configure the run task
run in Compile := Defaults.runTask(fullClasspath in Compile, mainClass in (Compile, run), runner in (Compile, run)).evaluated,
// Additional build settings
scalacOptions ++= Seq("-deprecation", "-unchecked"),
pomIncludeRepository := { x => false },
// Repositories for dependencies
resolvers ++= Seq(
"sonatype-releases" at "https://oss.sonatype.org/content/repositories/releases/",
"Typesafe repository" at "https://repo.typesafe.com/typesafe/releases/",
"Second Typesafe repo" at "https://repo.typesafe.com/typesafe/maven-releases/",
"Spark Packages Repo" at "https://repos.spark-packages.org/",
"Maven Central" at "https://repo1.maven.org/maven2/",
Resolver.sonatypeRepo("public")
),
// Publishing settings
publishTo := {
val nexus = "https://oss.sonatype.org/"
if (isSnapshot.value)
Some("snapshots" at nexus + "content/repositories/snapshots")
else
Some("releases" at nexus + "service/local/staging/deploy/maven2")
}
)
import sbtassembly.AssemblyPlugin.autoImport._
// Strategy for handling conflicts in META-INF during assembly
assemblyMergeStrategy in assembly := {
case PathList("META-INF", xs @ _*) => MergeStrategy.discard
case x => MergeStrategy.first
}
// Shade rules to prevent conflicts in shaded libraries
assemblyShadeRules in assembly := Seq(
ShadeRule.rename("com.google.common.**" -> "repackaged.com.google.common.@1").inAll,
ShadeRule.rename("com.google.protobuf.**" -> "repackaged.com.google.protobuf.@1").inAll,
// ShadeRule.rename("io.grpc.**" -> "repackaged.io.grpc.@1").inAll
)
I try to follow the dependence on BOM google, I try also to switch to maven but still the same problem...
I use this command to launch my app :
gcloud dataproc batches submit spark --batch=graphuser-$(date '+%Y%m%d')-$(date +%s) --region=europe-west4 --subnet=dataproc-subnet-eu-west4 --version=1.2 --properties=spark.dataproc.scaling.version=2,spark.dynamicAllocation.enabled=true,spark.dynamicAllocation.initialExecutors=4,spark.dynamicAllocation.minExecutors=4,spark.sql.shuffle.partitions=500,spark.executor.memory=12g,spark.dynamicAllocation.executorAllocationRatio=0.5,spark.dynamicAllocation.maxExecutors=100,spark.driver.memory=9g --ttl=12h --jar=gs://dataproc-data-eng/user_ids/graphUser-assembly-0.0.1.jar
Upvotes: 1
Views: 81