user1052610
user1052610

Reputation: 4719

which jar contains org.apache.spark.sql.api.java.JavaSQLContext

The following dependency is in the pom:

<dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-sql_2.10</artifactId>
    <version>1.3.0</version>
</dependency>

I expect the jar to contain the following class:

org.apache.spark.sql.api.java.JavaSQLContext

but while it contains the package org.apache.spark.sql.api.java, all that package appears to contain are interfaces named UDF1- UDSF22.

Which is the correct dependency to get JavaSQLContext?

Thanks.

Upvotes: 2

Views: 10658

Answers (3)

Johan Witters
Johan Witters

Reputation: 1636

I had the same problem, and it was because I was looking at the wrong version of the documentation.

My understanding from the latest documentation - https://spark.apache.org/docs/latest/sql-programming-guide.html#loading-data-programmatically - is to use something like this (copied from the doc):

SQLContext sqlContext = null;   // Determine;
DataFrame schemaPeople = null; // The DataFrame from the previous example.

// DataFrames can be saved as Parquet files, maintaining the schema information.
schemaPeople.write().parquet("people.parquet");

// Read in the Parquet file created above.  Parquet files are self-describing so the schema is preserved.
// The result of loading a parquet file is also a DataFrame.
DataFrame parquetFile = sqlContext.read().parquet("people.parquet");

// Parquet files can also be registered as tables and then used in SQL statements.
parquetFile.registerTempTable("parquetFile");
DataFrame teenagers = sqlContext.sql("SELECT name FROM parquetFile WHERE age >= 13 AND age <= 19");
List<String> teenagerNames = teenagers.javaRDD().map(new Function<Row, String>() {
  public String call(Row row) {
    return "Name: " + row.getString(0);
  }
}).collect();

Upvotes: 0

Tim Biegeleisen
Tim Biegeleisen

Reputation: 522499

From a cursory search, it appears that the class org.apache.spark.sql.api.java.JavaSQLContext only appears in the 1.2 versions and earlier of the spark-sql JAR file. It is likely that the code with which you are working is also using this older dependency. You have two choices at this point: you can either upgrade your code usage, or you can downgrade the spark-sql JAR. You probably want to go with the former option.

If you insist on keeping your code the same, then including the following dependency in your POM should fix the problem:

<dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-sql_2.10</artifactId>
    <version>1.2.2</version>
</dependency>

If you want to upgrade your code, see the answer given by @DB5

Upvotes: 1

DB5
DB5

Reputation: 13998

The JavaSQLContext class has been removed from version 1.3.0 onwards. You should use org.apache.spark.sql.SQLContext class instead. The documentation states the following:

Prior to Spark 1.3 there were separate Java compatible classes (JavaSQLContext and JavaSchemaRDD) that mirrored the Scala API. In Spark 1.3 the Java API and Scala API have been unified. Users of either language should use SQLContext and DataFrame. In general theses classes try to use types that are usable from both languages (i.e. Array instead of language specific collections). In some cases where no common type exists (e.g., for passing in closures or Maps) function overloading is used instead.

Additionally the Java specific types API has been removed. Users of both Scala and Java should use the classes present in org.apache.spark.sql.types to describe schema programmatically.

As an aside if you want to search which jars contain a specific class you can use the Advanced Search of Maven Central and search "By Classname". So here is the search for JavaSQLContext: http://search.maven.org/#search|ga|1|fc%3A%22org.apache.spark.sql.api.java.JavaSQLContext%22

Upvotes: 4

Related Questions