Jae Kim
Jae Kim

Reputation: 145

Can't find jar for missing class

Can't find the jar that has org.apache.spark.sql.Row class

I opened up the jar file spark-sql_2.11-2.4.3.jar but org.apache.spark.sql.Row class is not there. But the documentation in Spark says it should be there. https://spark.apache.org/docs/2.1.1/api/java/org/apache/spark/sql/Row.html

import org.apache.spark.sql.SparkSession
import com.microsoft.azure.sqldb.spark.config.Config
import com.microsoft.azure.sqldb.spark.connect._

object BulkCopy extends App{
  val spark = SparkSession
    .builder()
    .appName("Spark SQL data sources example")
    .config("spark.some.config.option", "some-value")
    .getOrCreate()
  var df = spark.read.parquet("parquet")

  val bulkCopyConfig = com.microsoft.azure.sqldb.spark.config.Config(Map(
    "url"            -> jdbcHostname,
    "databaseName"   -> jdbcDatabase,
    "user"           -> jdbcUsername,
    "password"       -> jdbcPassword,
    "dbTable"        -> "dbo.RAWLOG_3_1_TEST1",
    "bulkCopyBatchSize" -> "2500",
    "bulkCopyTableLock" -> "true",
    "bulkCopyTimeout"   -> "600"
  ))

  df.bulkCopyToSqlDB(bulkCopyConfig)

Error:(17, 13) Symbol 'type org.apache.spark.sql.Row' is missing from the classpath.
This symbol is required by 'type org.apache.spark.sql.DataFrame'.
Make sure that type Row is in your classpath and check for conflicting dependencies with `-Ylog-classpath`.
A full rebuild may help if 'package.class' was compiled against an incompatible version of org.apache.spark.sql.
   var df = spark.read.parquet("parquet")

Upvotes: 4

Views: 1533

Answers (2)

Vivek Sethi
Vivek Sethi

Reputation: 974

org.apache.spark.sql.Row class is not a part of jar file spark-sql_2.11-2.4.3.jar. Instead you can find it in spark-catalyst_2.11-2.4.3.jar. The following spark sql library dependency is dependent on spark-catalyst lib and your build tool (maven/sbt) should be able to resolve that automatically for you

<dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-sql_2.11</artifactId>
    <version>2.4.3</version>
</dependency>

OR

libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.4.3"

Here're the dependencies for spar-sql lib: enter image description here

Upvotes: 2

Related Questions