mjsee
mjsee

Reputation: 109

Spark get value form WrappedArray<WrappedArray<Double>> in UDF Java

I have a column in my Dataset<Row> that contains WrappedArray<WrappedArray<Double>>. I am passing this column to an UDF to pull out one of the values.

How would I go about getting access to the Doubles in this nested structure?

I want to do something like this :

sparkSession.udf().register(ADD_START_TOTAL, (UDF1<WrappedArray<WrappedArray<Double>>, Double>) (totals) -> totals[0][1], DataTypes.DoubleType);

Here is an example of what the column looks like when I invoke Dataset.show() method my Dataset looks like below.

[WrappedArray(2.0...

EDIT: Found this post How to cast a WrappedArray[WrappedArray[Float]] to Array[Array[Float]] in spark (scala) but not sure how to translate this to Java.

Upvotes: 4

Views: 2542

Answers (1)

abaghel
abaghel

Reputation: 15297

Consider your Dataset<Row> ds1 has value column with following schema

root
 |-- value: array (nullable = true)
 |    |-- element: array (containsNull = true)
 |    |    |-- element: double (containsNull = false)

Define your UDF1 function like below.

static UDF1<WrappedArray<WrappedArray<Double>>, List<Double>> getValue = new UDF1<WrappedArray<WrappedArray<Double>>, List<Double>>() {
public List<Double> call(WrappedArray<WrappedArray<Double>> data) throws Exception {
        List<Double> doubleList = new ArrayList<Double>();
        for(int i=0; i<data.size(); i++){
            doubleList.addAll(JavaConversions.seqAsJavaList(data.apply(i)));
        }
        return doubleList;
    }
}

Now Register and Call UDF1 function like below.

import static org.apache.spark.sql.functions.col;
import static org.apache.spark.sql.functions.callUDF;
import scala.collection.JavaConversions;

// register UDF
spark.udf().register("getValue", getValue, DataTypes.createArrayType(DataTypes.DoubleType));

// Call UDF
Dataset<Row> ds2 = ds1.select(col("*"), callUDF("getValue", col("value")).as("udf-value"));
ds2.show(false);

Upvotes: 5

Related Questions