Need help migrating from Spark 2.0 to Spark 3.1 - Accumulable to AccumulatorV2

Question

I'm working on adding Spark 3.1 and Scala 2.12 support for Kylo Data-Lake Management Platform.

I need help with migrating the following functions:

    /**
     * Creates an {@link Accumulable} shared variable with a name for display in the Spark UI.
     */
    @Nonnull
    static  Accumulable accumulable(@Nonnull final R initialValue, @Nonnull final String name, @Nonnull final AccumulableParam param,
                                                  @Nonnull final KyloCatalogClient> client) {
        return ((KyloCatalogClientV2) client).getSparkSession().sparkContext().accumulable(initialValue, name, param);
    }

/**
     * Applies the specified function to the specified field of the data set.
     */
    @Nonnull
    static Dataset map(@Nonnull final Dataset dataSet, @Nonnull final String fieldName, @Nonnull final Function1 function, @Nonnull final DataType returnType) {
        final Seq inputs = Seq$.MODULE$.newBuilder().$plus$eq(dataSet.col(fieldName)).result();
        final UserDefinedFunction udf = new UserDefinedFunction(function, returnType, Option$.MODULE$.>empty());
        return dataSet.withColumn(fieldName, udf.apply(inputs));
    }

Can be found in here and here

I'm adding a new maven module kylo-spark-catalog-spark-v3 to support apache-spark:3.1.2 and scala:2.12.10 at time of writing this.

I'm having trouble in:

Creating an instance of AccumulatorV2 as the deprecation notice on the Accumulable class is not very clear. here's my attempt at the first function - NOT COMPILING:

    @Nonnull
    static  AccumulatorV2 accumulable(@Nonnull final R initialValue, @Nonnull final String name, @Nonnull final AccumulatorV2 param,
                                                  @Nonnull final KyloCatalogClient> client) {
        AccumulatorV2 acc = AccumulatorContext.get(AccumulatorContext.newId()).get();
        acc.register(((KyloCatalogClientV3) client).getSparkSession().sparkContext(), new Some<>(name), true);
        return acc;
    }

Creating an instance of UDF in the second function, the UserDefinedFunction seems to complain that it cannot be instanciated as its an abstract class. here's my attempt at the second function - COMPILING but not sure if makes sense:

    /**
     * Applies the specified function to the specified field of the data set.
     */
    @Nonnull
    static Dataset map(@Nonnull final Dataset dataSet, @Nonnull final String fieldName, @Nonnull final Function1 function, @Nonnull final DataType returnType) {
        final Seq inputs = Seq$.MODULE$.newBuilder().$plus$eq(dataSet.col(fieldName)).result();
        final UserDefinedFunction udf = udf(function, returnType);
        return dataSet.withColumn(fieldName, udf.apply(inputs));
    }

Can you please advice me on how to get this right, or if there's docs out there that is close to this case.

Need help migrating from Spark 2.0 to Spark 3.1 - Accumulable to AccumulatorV2

Answers (1)

1. `accumulable`

2. `map`

Related Questions

Need help migrating from Spark 2.0 to Spark 3.1 - Accumulable to AccumulatorV2

Answers (1)

1. accumulable

2. map

Related Questions

1. `accumulable`

2. `map`