Reputation: 1117
I have the below spark dataset/dataframe. I have to create a new column diff_col by finding the difference between absolute values of col_2 and col_3
col_1 col_2 col_3
A 5 3
B null -2
C 2 null
D null null
E 3 1
F 4 -2
Expected output:
col_1 col_2 col_3 diff_col
A 5 3 2
B null -2 -2
C 2 null 2
D null null 0
E 3 1 2
F 4 -2 2
Not sure how to do this in java spark. In pyspark we can do by replacing null with lit(0) and then col(col_2) - col(col_3). Is there a java spark equivalent for this?
Upvotes: 0
Views: 1287
Reputation: 1800
You can use coalese [1] and lit [2] since you have only two columns.
private Column absoluteBetweenTwo(final Dataset<Row> ds) {
final Column col_1 = functions.coalesce(ds.col("col_1"), functions.lit(0));
final Column col_2 = functions.coalesce(ds.col("col_2"), functions.lit(0));
return functions.abs(col_1.minus(col_2));
}
And than add it as column
ds.withColumn("col_3", this.absoluteBetweenTwo(ds));
The alternative solution is to wrap your Dataset
explictitly in DataFrameNaFunctions
[3], and use fill
otherwise not acceptable from Java API.
fill
will replace all NULL
with provided value for columns of corresponding type.
return new DataFrameNaFunctions(ds).fill(0L);
3: https://spark.apache.org/docs/latest/api/scala/org/apache/spark/sql/DataFrameNaFunctions.html
Upvotes: 1
Reputation: 10362
Use abs
,nvl
SQL functions inside expr
function.
df
.withColumn(
"diff_col",
expr("(abs(nvl(col_2,0)) - abs(nvl(col_3,0)))")
)
.show(false)
Output
+-----+-----+-----+--------+
|col_1|col_2|col_3|diff_col|
+-----+-----+-----+--------+
|A |5 |3 |2 |
|B |null |-2 |-2 |
|C |2 |null |2 |
|D |null |null |0 |
|E |3 |1 |2 |
|F |4 |-2 |2 |
+-----+-----+-----+--------+
Upvotes: 0