John Humanyun
John Humanyun

Reputation: 945

Aggregation of multiple columns in spark Java

I have Columns list priceColumns that are dynamic. I am trying to aggregate those columns in Dataset,

public Dataset getAgg(RelationalGroupedDataset rlDataset){
Dataset selectedDS=null;
    for(String priceCol :priceColumns){
            selectedDS=rlDataset.agg(expr("sum(cast("+priceCol+" as BIGINT))"));
        }
return selectedDS;
}

The above code is a improper code, What I am trying to do here is, based on each Columns present the aggregation should happen for that Dataset, How can I write a generic code ? I'm completely stuck here.

Upvotes: 1

Views: 639

Answers (1)

John Humanyun
John Humanyun

Reputation: 945

I tried with Below way and it solved.

List<Column> columnExpr = priceColumns.stream()
                             .map(col->expr("sum(cast("+col+" as BIGINT))").as(col))
                             .collect(Collectors.toList());

Then,

selectedDS= rlDataset
                    .agg(columnExpr.get(0),
                JavaConverters.asScalaIteratorConverter(columnExpr.subList(1, columnExpr.size())
                    .iterator()).asScala().toSeq());

Upvotes: 2

Related Questions