Reputation: 5735
I would like to iterate through the content of a column in a spark DataFrame
and correct the data within a cell if it meets a certain condition
+-------------+
|column_title |
+-------------+
+-----+
|null |
+-----+
+-----+
|0 |
+-----+
+-----+
|1 |
+-----+
Lets say I want to display something else when value of column is null, I tried with
Column.when()
DataSet.withColumn()
But I cant find the right method, i don't think it would be necessary to convert to RDD and iterate through it.
Upvotes: 4
Views: 1377
Reputation: 526
Another way of doing this could be by using UDF.
Create a UDF:
private static UDF1 myUdf = new UDF1<String, String>() {
public String call(final String str) throws Exception {
// any condition or custom function can be used
return StringUtils.rightPad(str, 25, 'A');
}
};
Register UDF in SparkSession:
sparkSession.udf().register("myUdf", myUdf, DataTypes.StringType);
Apply udf on dataset:
Dataset<Row> dataset = dataset.withColumn("city", functions.callUDF("myudf", col("city")));
Upvotes: 1
Reputation: 15297
You can use when
and equalTo
or when
and isNull
.
Dataset<Row> df1 = df.withColumn("value", when(col("value").equalTo("bbb"), "ccc").otherwise(col("value")));
Dataset<Row> df2 = df.withColumn("value", when(col("value").isNull(), "ccc").otherwise(col("value")));
If you only want to replace null values then you can also use na
and fill
.
Dataset<Row> df3 = df.na().fill("ccc");
Upvotes: 4