Reputation: 4719
The following dependency is in the pom:
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.10</artifactId>
<version>1.3.0</version>
</dependency>
I expect the jar to contain the following class:
org.apache.spark.sql.api.java.JavaSQLContext
but while it contains the package org.apache.spark.sql.api.java
, all that package appears to contain are interfaces named UDF1
- UDSF22
.
Which is the correct dependency to get JavaSQLContext
?
Thanks.
Upvotes: 2
Views: 10658
Reputation: 1636
I had the same problem, and it was because I was looking at the wrong version of the documentation.
My understanding from the latest documentation - https://spark.apache.org/docs/latest/sql-programming-guide.html#loading-data-programmatically - is to use something like this (copied from the doc):
SQLContext sqlContext = null; // Determine;
DataFrame schemaPeople = null; // The DataFrame from the previous example.
// DataFrames can be saved as Parquet files, maintaining the schema information.
schemaPeople.write().parquet("people.parquet");
// Read in the Parquet file created above. Parquet files are self-describing so the schema is preserved.
// The result of loading a parquet file is also a DataFrame.
DataFrame parquetFile = sqlContext.read().parquet("people.parquet");
// Parquet files can also be registered as tables and then used in SQL statements.
parquetFile.registerTempTable("parquetFile");
DataFrame teenagers = sqlContext.sql("SELECT name FROM parquetFile WHERE age >= 13 AND age <= 19");
List<String> teenagerNames = teenagers.javaRDD().map(new Function<Row, String>() {
public String call(Row row) {
return "Name: " + row.getString(0);
}
}).collect();
Upvotes: 0
Reputation: 522499
From a cursory search, it appears that the class org.apache.spark.sql.api.java.JavaSQLContext
only appears in the 1.2
versions and earlier of the spark-sql
JAR file. It is likely that the code with which you are working is also using this older dependency. You have two choices at this point: you can either upgrade your code usage, or you can downgrade the spark-sql
JAR. You probably want to go with the former option.
If you insist on keeping your code the same, then including the following dependency in your POM should fix the problem:
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.10</artifactId>
<version>1.2.2</version>
</dependency>
If you want to upgrade your code, see the answer given by @DB5
Upvotes: 1
Reputation: 13998
The JavaSQLContext
class has been removed from version 1.3.0 onwards. You should use org.apache.spark.sql.SQLContext
class instead. The documentation states the following:
Prior to Spark 1.3 there were separate Java compatible classes (
JavaSQLContext
andJavaSchemaRDD
) that mirrored the Scala API. In Spark 1.3 the Java API and Scala API have been unified. Users of either language should useSQLContext
andDataFrame
. In general theses classes try to use types that are usable from both languages (i.e.Array
instead of language specific collections). In some cases where no common type exists (e.g., for passing in closures orMaps
) function overloading is used instead.Additionally the Java specific types API has been removed. Users of both Scala and Java should use the classes present in
org.apache.spark.sql.types
to describe schema programmatically.
As an aside if you want to search which jars contain a specific class you can use the Advanced Search of Maven Central and search "By Classname". So here is the search for JavaSQLContext: http://search.maven.org/#search|ga|1|fc%3A%22org.apache.spark.sql.api.java.JavaSQLContext%22
Upvotes: 4