Reputation: 325
I am creating a DataSet like this :
SparkSession spark = JavaSparkSessionSingleton.getInstance(javaStreamingContext.sparkContext().getConf());
Dataset<Row> journyDF = spark.createDataFrame(journyDataJavaRDD, JournyData.class);
"journyDF" has a column "longitude". If the value of that column is 0 then I want to remove that row from "journyDF". (Skip the row from further processing)
Is there a method which can do that?
Upvotes: 0
Views: 293
Reputation: 1421
The simplest approach would appear to be Dataset.filter()
, so something like
Dataset<Row> journyDF = spark.createDataFrame(journyDataJavaRDD, JournyData.class).filter($"longitude" != 0);
or perhaps
[...].filter(col("longitude").notEqual(0));
(You don't specify the type of the column, so you may need to adjust this.)
Upvotes: 1