JBoy
JBoy

Reputation: 5735

Spark Java edit data in column

I would like to iterate through the content of a column in a spark DataFrame and correct the data within a cell if it meets a certain condition

+-------------+
|column_title |
+-------------+
+-----+
|null |
+-----+
+-----+
|0    |
+-----+
+-----+
|1    |
+-----+

Lets say I want to display something else when value of column is null, I tried with

Column.when() DataSet.withColumn()

But I cant find the right method, i don't think it would be necessary to convert to RDD and iterate through it.

Upvotes: 4

Views: 1377

Answers (2)

Aryan087
Aryan087

Reputation: 526

Another way of doing this could be by using UDF.

Create a UDF:

private static UDF1 myUdf = new UDF1<String, String>() {
public String call(final String str) throws Exception {
    // any condition or custom function can be used
    return StringUtils.rightPad(str, 25, 'A');
  }
};

Register UDF in SparkSession:

sparkSession.udf().register("myUdf", myUdf, DataTypes.StringType);

Apply udf on dataset:

Dataset<Row> dataset = dataset.withColumn("city", functions.callUDF("myudf", col("city")));

Upvotes: 1

abaghel
abaghel

Reputation: 15297

You can use when and equalTo or when and isNull.

Dataset<Row> df1 = df.withColumn("value", when(col("value").equalTo("bbb"), "ccc").otherwise(col("value")));

Dataset<Row> df2 = df.withColumn("value", when(col("value").isNull(), "ccc").otherwise(col("value")));

If you only want to replace null values then you can also use na and fill.

Dataset<Row> df3 = df.na().fill("ccc");

Upvotes: 4

Related Questions