Vikas Gite
Vikas Gite

Reputation: 325

How to remove a record from Spark DataSet

I am creating a DataSet like this :

SparkSession spark = JavaSparkSessionSingleton.getInstance(javaStreamingContext.sparkContext().getConf());
Dataset<Row> journyDF = spark.createDataFrame(journyDataJavaRDD, JournyData.class);

"journyDF" has a column "longitude". If the value of that column is 0 then I want to remove that row from "journyDF". (Skip the row from further processing)

Is there a method which can do that?

Upvotes: 0

Views: 293

Answers (1)

DavidW
DavidW

Reputation: 1421

The simplest approach would appear to be Dataset.filter(), so something like

Dataset<Row> journyDF = spark.createDataFrame(journyDataJavaRDD, JournyData.class).filter($"longitude" != 0);

or perhaps

[...].filter(col("longitude").notEqual(0));

(You don't specify the type of the column, so you may need to adjust this.)

Upvotes: 1

Related Questions