John Humanyun
John Humanyun

Reputation: 945

GroupBy and Aggregate Function In JAVA spark Dataset

I am trying to do below operation on a dataset for Grouping and aggregating the Column expend to add up. But this isn't working on a normal Dataset it says for RelationalGroupedDataset. How can I achieve the below operation in the Normal Dataset

dataset.select.(col("col1"),col("col2"),col("expend")).groupBy(col("col1"),col("col2"),col("expend")).agg(sum("expend"))

The SQL query looks like select col1,col2,SUM(expend) from table group by col1,col2

The Columns gets repeated when I try this code. dataset.columns() gives me [col1,col2,expend,expend] is the way of approach right?

Upvotes: 3

Views: 16275

Answers (1)

John Humanyun
John Humanyun

Reputation: 945

I used below code to solve the issue. Created a list List<Column> aggCols; This will have the operation of the columns. here I added as

aggCols.add(expr("sum(expend1)"));
addCols.add(expr("sum(expend2)"));

 dataset.select.(col("col1"),col("col2"),col("expend"))
.groupBy(col("col1"),col("col2"))
.agg(aggCols.get(0), JavaConverters.asScalaIteratorConverter(aggCols.subList(1,aggCols.size()).iterator()).asScala().toSeq());

I added some checks when I have only one col to sum then I directly do the sum.

Upvotes: 6

Related Questions