Jason Shu
Jason Shu

Reputation: 139

Why can't I import org.apache.spark.sql.DataFrame

I have Maven dependencies spark-sql_2.1.0and spark-hive_2.1.0. However, when I am trying to import org.apache.spark.sql.DataFrame, there is an error. But importing org.apache.spark.sql.SQLContext is OK, there is no errors. Why?

Upvotes: 5

Views: 20962

Answers (2)

T. Gawęda
T. Gawęda

Reputation: 16066

DataFrame has become a type DataFrame = Dataset[Row] in Spark 2.x. Java doesn't have type aliases, so it's not available in Java. You should now use the new type Dataset<Row>, so import both org.apache.spark.sql.Dataset and org.apache.spark.sql.Row

Upvotes: 10

Ramesh Maharjan
Ramesh Maharjan

Reputation: 41957

 import org.apache.spark.sql.DataFrame

works for scala and not for java as there is no library developed for java. You can use dataSet as explained in Spark SQL, DataFrames and Datasets Guide

You can import the following

import org.apache.spark.sql.Dataset;
import org.apache.spark.sql.Row;

and use them as

Dataset<Row> peopleDataFrame = spark.createDataFrame(rowRDD, schema);

Or

Dataset<Row> peopleDF = spark.createDataFrame(peopleRDD, Person.class);

Or

Dataset<Row> usersDF = spark.read().load("examples/src/main/resources/users.parquet");

Upvotes: 5

Related Questions